Tutorials ========= This section provides step-by-step tutorials on how to use and customize cooperative co-evolutionary algorithms for feature selection using the PyCCEA package. .. contents:: :local: :depth: 2 :backlinks: entry How to use a baseline CCEA? --------------------------- This tutorial demonstrates how to use the ``CCFSRFG1`` algorithm — a cooperative co-evolutionary algorithm (CCEA) variant with random feature grouping — to perform feature selection on the Wisconsin Diagnostic Breast Cancer (WDBC) dataset. In this example, you will: - Load the dataset using the ``DataLoader``. - Load the dataset and algorithm configuration files. - Run the optimization process. We start by importing the necessary modules and classes: .. code-block:: python import toml import importlib.resources from pyccea.coevolution import CCFSRFG1 from pyccea.utils.datasets import DataLoader - ``toml`` is used to parse configuration files. - ``importlib.resources`` helps access files inside a package. - ``CCFSRFG1`` is the cooperative co-evolution algorithm you'll run. - ``DataLoader`` is a utility to prepare the dataset. Next, we load the configuration for the dataset from a `.toml` file: .. code-block:: python with importlib.resources.open_text("pyccea.parameters", "dataloader.toml") as toml_file: data_conf = toml.load(toml_file) This code looks for the file ``dataloader.toml`` inside the ``pyccea.parameters`` package and loads its content into a dictionary. We then create and prepare the dataset using the ``DataLoader``: .. code-block:: python dataloader = DataLoader(dataset="wdbc", conf=data_conf) dataloader.get_ready() - ``dataset="wdbc"`` specifies the dataset to use (WDBC in this case). - ``conf=data_conf`` passes the parameters we just loaded. - ``get_ready()`` performs any necessary preprocessing and splits the data. We now load the configuration for the CCFSRFG1 algorithm: .. code-block:: python with importlib.resources.open_text("pyccea.parameters", "ccfsrfg.toml") as toml_file: ccea_conf = toml.load(toml_file) This step reads the ``ccfsrfg.toml`` file and loads the algorithm's hyperparameters into a dictionary. Now we're ready to run the algorithm: .. code-block:: python ccea = CCFSRFG1(data=dataloader, conf=ccea_conf, verbose=False) ccea.optimize() - ``CCFSRFG1(...)`` initializes the algorithm with data and configuration. - ``optimize()`` starts the evolutionary process for feature selection. Once the optimization is complete, the best feature subset is stored in the ``best_context_vector`` attribute of the ``CCFSRFG1`` object, a binary vector where: - `1` indicates that a feature is selected. - `0` indicates that a feature is excluded. Some CCEAs, like ``CCFSRFG1``, may reorder the features during the decomposition phase. In these cases, you must map the selected features back to their original names using the reordered indices. .. code-block:: python # Get original feature column names feature_cols = dataloader.data.columns # Reorder the kept features according to the algorithm's internal feature order reordered_features = feature_cols[ccea.feature_idxs] # Select features where best_context_vector == 1 selected_features = reordered_features[ccea.best_context_vector.astype(bool)].tolist() If you experiment with other CCEAs, such as ``CCPSTFG``, be aware that some features may be removed before optimization. For those, you might need to filter out removed features before reordering: .. code-block:: python # Create a set of indices for features that were not removed kept_feature_indices = set(range(len(feature_cols))) - ccea.removed_features # Select the original feature names corresponding to the kept indices kept_feature_names = feature_cols[list(kept_feature_indices)] # Reorder the kept features according to the algorithm's internal feature order reordered_features = kept_feature_names[ccea.feature_idxs] # Select features where best_context_vector == 1 selected_features = reordered_features[ccea.best_context_vector.astype(bool)].tolist() Feel free to modify the configuration files or try different CCEAs to explore various scenarios. How to customize a CCEA? ------------------------ You can implement your own cooperative co-evolutionary algorithm in PyCCEA by subclassing the :py:class:`pyccea.coevolution.ccea.CCEA` base class and customizing its core components. To do so, your new class should implement or override the following methods: - ``_init_collaborator()``: defines how individuals from different subcomponents collaborate. - ``_init_evaluator()``: specifies how candidate solutions are evaluated. - ``_init_subpop_initializer()``: sets up how initial solutions are generated. - ``_init_optimizers()``: chooses the evolutionary algorithm used for optimization. - ``_init_decomposer()``: defines the decomposition strategy (feature grouping). - ``optimize()``: runs the main optimization loop. We will use `CCEAFS` (Cooperative Co-Evolutionary-Based Feature Selection) as a concrete example to illustrate the necessary steps. 1. **_init_collaborator** This method instantiates collaboration strategies that define how individuals from different subpopulations interact to form a full candidate solution (context vector) during evaluation. For example: .. code-block:: python def _init_collaborator(self): """Instantiate collaboration method.""" self.best_collaborator = SingleBestCollaboration() self.random_collaborator = SingleRandomCollaboration(seed=self.seed) - ``SingleBestCollaboration()`` typically selects the best individual from each cooperating subpopulation, used when evaluating evolved individuals. - ``SingleRandomCollaboration(seed=self.seed)`` selects random collaborators, useful for initialization to encourage diversity. 2. **_init_evaluator** This method sets up the fitness function that evaluates complete candidate solutions and guides the evolutionary process. For example: .. code-block:: python def _init_evaluator(self): """Instantiate evaluation method.""" evaluator = WrapperEvaluation( task=self.conf["wrapper"]["task"], model_type=self.conf["wrapper"]["model_type"], eval_function=self.conf["evaluation"]["eval_function"], eval_mode=self.eval_mode, n_classes=getattr(self.data, "n_classes", None) ) self.fitness_function = SubsetSizePenalty( evaluator=evaluator, weights=self.conf["evaluation"]["weights"] ) Here, ``WrapperEvaluation`` typically wraps a model evaluation, while ``SubsetSizePenalty`` adds a penalty term to encourage compact feature subsets. 3. **_init_subpop_initializer** This method defines how to generate initial subpopulations of individuals, marking the start of co-evolution. For example: .. code-block:: python def _init_subpop_initializer(self): """Instantiate subpopulation initialization method.""" self.initializer = RandomBinaryInitialization( data=self.data, subcomp_sizes=self.subcomp_sizes, subpop_sizes=self.subpop_sizes, collaborator=self.random_collaborator, fitness_function=self.fitness_function ) ``RandomBinaryInitialization`` suggests each individual is a binary vector (e.g., feature inclusion/exclusion) initialized randomly. 4. **_init_optimizers** This method instantiates evolutionary optimizers to independently evolve each subpopulation. For example: .. code-block:: python def _init_optimizers(self): """Instantiate evolutionary algorithms to evolve each subpopulation.""" self.optimizers = [] for i in range(self.n_subcomps): optimizer = BinaryGeneticAlgorithm( subpop_size=self.subpop_sizes[i], n_features=self.subcomp_sizes[i], conf=self.conf ) self.optimizers.append(optimizer) Each subpopulation uses a ``BinaryGeneticAlgorithm`` tailored for binary representations. 5. **_init_decomposer** This method defines how the original problem is split into subproblems (subcomponents), each handled by a separate subpopulation. For example: .. code-block:: python def _init_decomposer(self): """Instantiate feature grouping method.""" self.decomposer = SequentialFeatureGrouping( n_subcomps=self.n_subcomps, subcomp_sizes=self.subcomp_sizes ) ``SequentialFeatureGrouping`` divides features into sequential groups; decomposition strategies vary and strongly affect algorithm performance. 6. **optimize** This is the main method orchestrating the entire co-evolutionary optimization workflow, including decomposition, initialization, evolution, evaluation, and convergence checks. .. code-block:: python def optimize(self): """Solve the feature selection problem through optimization.""" self._problem_decomposition() self._init_subpopulations() self._init_optimizers() self.current_best = self._get_best_individuals( subpops=self.subpops, fitness=self.fitness, context_vectors=self.context_vectors ) self.best_context_vector, self.best_fitness = self._get_global_best() self.best_context_vectors.append(self.best_context_vector.copy()) self.best_feature_idxs = self.feature_idxs.copy() n_gen = 0 stagnation_counter = 0 while n_gen <= self.conf["coevolution"]["max_gen"]: self.convergence_curve.append(self.best_fitness) # Evolve subpopulations independently current_subpops = [ self.optimizers[i].evolve(self.subpops[i], self.fitness[i]) for i in range(self.n_subcomps) ] # Evaluate evolved individuals collaboratively current_fitness, current_context_vectors = [], [] for i in range(self.n_subcomps): current_fitness.append([]) current_context_vectors.append([]) for j in range(self.subpop_sizes[i]): collaborators = self.best_collaborator.get_collaborators( subpop_idx=i, indiv_idx=j, current_subpops=current_subpops, current_best=self.current_best ) context_vector = self.best_collaborator.build_context_vector(collaborators) current_context_vectors[i].append(context_vector.copy()) fitness_value = self.fitness_function.evaluate(context_vector, self.data) current_fitness[i].append(fitness_value) # Update subpopulations and fitness self.subpops = copy.deepcopy(current_subpops) self.fitness = copy.deepcopy(current_fitness) self.context_vectors = copy.deepcopy(current_context_vectors) self.current_best = self._get_best_individuals( subpops=self.subpops, fitness=self.fitness, context_vectors=self.context_vectors ) best_context_vector, best_fitness = self._get_global_best() if self.best_fitness < best_fitness: stagnation_counter = 0 self.best_context_vector = best_context_vector.copy() self.best_context_vectors.append(self.best_context_vector.copy()) self.best_fitness = best_fitness else: stagnation_counter += 1 if stagnation_counter >= self.conf["coevolution"]["max_gen_without_improvement"]: break n_gen += 1 The ``optimize`` method demonstrates the typical CCEA workflow: - **Decomposition and initialization:** Decompose the problem and initialize subpopulations and optimizers. - **Evolution loop:** Repeat for a maximum number of generations or until stagnation. - **Subpopulation evolution:** Evolve each subpopulation independently. - **Collaboration and evaluation:** Collaborate across subpopulations to build complete solutions and evaluate fitness. - **Best solution tracking:** Update global best solution and check convergence. By overriding these key ``_init_*`` methods and ``optimize()``, you can flexibly customize the co-evolutionary process to fit your problem domain, defining collaboration, evaluation, initialization, optimization, decomposition, and the optimization flow itself.