Tutorials
This section provides step-by-step tutorials on how to use and customize cooperative co-evolutionary algorithms for feature selection using the PyCCEA package.
How to use a baseline CCEA?
This tutorial demonstrates how to use the CCFSRFG1
algorithm — a cooperative co-evolutionary
algorithm (CCEA) variant with random feature grouping — to perform feature selection on the
Wisconsin Diagnostic Breast Cancer (WDBC) dataset.
In this example, you will:
Load the dataset using the
DataLoader
.Load the dataset and algorithm configuration files.
Run the optimization process.
We start by importing the necessary modules and classes:
import toml
import importlib.resources
from pyccea.coevolution import CCFSRFG1
from pyccea.utils.datasets import DataLoader
toml
is used to parse configuration files.importlib.resources
helps access files inside a package.CCFSRFG1
is the cooperative co-evolution algorithm you’ll run.DataLoader
is a utility to prepare the dataset.
Next, we load the configuration for the dataset from a .toml file:
with importlib.resources.open_text("pyccea.parameters", "dataloader.toml") as toml_file:
data_conf = toml.load(toml_file)
This code looks for the file dataloader.toml
inside the pyccea.parameters
package and loads
its content into a dictionary.
We then create and prepare the dataset using the DataLoader
:
dataloader = DataLoader(dataset="wdbc", conf=data_conf)
dataloader.get_ready()
dataset="wdbc"
specifies the dataset to use (WDBC in this case).conf=data_conf
passes the parameters we just loaded.get_ready()
performs any necessary preprocessing and splits the data.
We now load the configuration for the CCFSRFG1 algorithm:
with importlib.resources.open_text("pyccea.parameters", "ccfsrfg.toml") as toml_file:
ccea_conf = toml.load(toml_file)
This step reads the ccfsrfg.toml
file and loads the algorithm’s hyperparameters into a
dictionary.
Now we’re ready to run the algorithm:
ccea = CCFSRFG1(data=dataloader, conf=ccea_conf, verbose=False)
ccea.optimize()
CCFSRFG1(...)
initializes the algorithm with data and configuration.optimize()
starts the evolutionary process for feature selection.
Once the optimization is complete, the best feature subset is stored in the best_context_vector
attribute of the CCFSRFG1
object, a binary vector where:
1 indicates that a feature is selected.
0 indicates that a feature is excluded.
Some CCEAs, like CCFSRFG1
, may reorder the features during the decomposition phase. In these
cases, you must map the selected features back to their original names using the reordered
indices.
# Get original feature column names
feature_cols = dataloader.data.columns
# Reorder the kept features according to the algorithm's internal feature order
reordered_features = feature_cols[ccea.feature_idxs]
# Select features where best_context_vector == 1
selected_features = reordered_features[ccea.best_context_vector.astype(bool)].tolist()
If you experiment with other CCEAs, such as CCPSTFG
, be aware that some features may be
removed before optimization. For those, you might need to filter out removed features before
reordering:
# Create a set of indices for features that were not removed
kept_feature_indices = set(range(len(feature_cols))) - ccea.removed_features
# Select the original feature names corresponding to the kept indices
kept_feature_names = feature_cols[list(kept_feature_indices)]
# Reorder the kept features according to the algorithm's internal feature order
reordered_features = kept_feature_names[ccea.feature_idxs]
# Select features where best_context_vector == 1
selected_features = reordered_features[ccea.best_context_vector.astype(bool)].tolist()
Feel free to modify the configuration files or try different CCEAs to explore various scenarios.
How to customize a CCEA?
You can implement your own cooperative co-evolutionary algorithm in PyCCEA by subclassing the
pyccea.coevolution.ccea.CCEA
base class and customizing its core components.
To do so, your new class should implement or override the following methods:
_init_collaborator()
: defines how individuals from different subcomponents collaborate._init_evaluator()
: specifies how candidate solutions are evaluated._init_subpop_initializer()
: sets up how initial solutions are generated._init_optimizers()
: chooses the evolutionary algorithm used for optimization._init_decomposer()
: defines the decomposition strategy (feature grouping).optimize()
: runs the main optimization loop.
We will use CCEAFS (Cooperative Co-Evolutionary-Based Feature Selection) as a concrete example to illustrate the necessary steps.
_init_collaborator
This method instantiates collaboration strategies that define how individuals from different subpopulations interact to form a full candidate solution (context vector) during evaluation. For example:
def _init_collaborator(self):
"""Instantiate collaboration method."""
self.best_collaborator = SingleBestCollaboration()
self.random_collaborator = SingleRandomCollaboration(seed=self.seed)
SingleBestCollaboration()
typically selects the best individual from each cooperating subpopulation, used when evaluating evolved individuals.SingleRandomCollaboration(seed=self.seed)
selects random collaborators, useful for initialization to encourage diversity.
_init_evaluator
This method sets up the fitness function that evaluates complete candidate solutions and guides the evolutionary process. For example:
def _init_evaluator(self):
"""Instantiate evaluation method."""
evaluator = WrapperEvaluation(
task=self.conf["wrapper"]["task"],
model_type=self.conf["wrapper"]["model_type"],
eval_function=self.conf["evaluation"]["eval_function"],
eval_mode=self.eval_mode,
n_classes=getattr(self.data, "n_classes", None)
)
self.fitness_function = SubsetSizePenalty(
evaluator=evaluator,
weights=self.conf["evaluation"]["weights"]
)
Here, WrapperEvaluation
typically wraps a model evaluation, while SubsetSizePenalty
adds a
penalty term to encourage compact feature subsets.
_init_subpop_initializer
This method defines how to generate initial subpopulations of individuals, marking the start of co-evolution. For example:
def _init_subpop_initializer(self):
"""Instantiate subpopulation initialization method."""
self.initializer = RandomBinaryInitialization(
data=self.data,
subcomp_sizes=self.subcomp_sizes,
subpop_sizes=self.subpop_sizes,
collaborator=self.random_collaborator,
fitness_function=self.fitness_function
)
RandomBinaryInitialization
suggests each individual is a binary vector (e.g., feature
inclusion/exclusion) initialized randomly.
_init_optimizers
This method instantiates evolutionary optimizers to independently evolve each subpopulation. For example:
def _init_optimizers(self):
"""Instantiate evolutionary algorithms to evolve each subpopulation."""
self.optimizers = []
for i in range(self.n_subcomps):
optimizer = BinaryGeneticAlgorithm(
subpop_size=self.subpop_sizes[i],
n_features=self.subcomp_sizes[i],
conf=self.conf
)
self.optimizers.append(optimizer)
Each subpopulation uses a BinaryGeneticAlgorithm
tailored for binary representations.
_init_decomposer
This method defines how the original problem is split into subproblems (subcomponents), each handled by a separate subpopulation. For example:
def _init_decomposer(self):
"""Instantiate feature grouping method."""
self.decomposer = SequentialFeatureGrouping(
n_subcomps=self.n_subcomps,
subcomp_sizes=self.subcomp_sizes
)
SequentialFeatureGrouping
divides features into sequential groups; decomposition strategies
vary and strongly affect algorithm performance.
optimize
This is the main method orchestrating the entire co-evolutionary optimization workflow, including decomposition, initialization, evolution, evaluation, and convergence checks.
def optimize(self):
"""Solve the feature selection problem through optimization."""
self._problem_decomposition()
self._init_subpopulations()
self._init_optimizers()
self.current_best = self._get_best_individuals(
subpops=self.subpops,
fitness=self.fitness,
context_vectors=self.context_vectors
)
self.best_context_vector, self.best_fitness = self._get_global_best()
self.best_context_vectors.append(self.best_context_vector.copy())
self.best_feature_idxs = self.feature_idxs.copy()
n_gen = 0
stagnation_counter = 0
while n_gen <= self.conf["coevolution"]["max_gen"]:
self.convergence_curve.append(self.best_fitness)
# Evolve subpopulations independently
current_subpops = [
self.optimizers[i].evolve(self.subpops[i], self.fitness[i])
for i in range(self.n_subcomps)
]
# Evaluate evolved individuals collaboratively
current_fitness, current_context_vectors = [], []
for i in range(self.n_subcomps):
current_fitness.append([])
current_context_vectors.append([])
for j in range(self.subpop_sizes[i]):
collaborators = self.best_collaborator.get_collaborators(
subpop_idx=i,
indiv_idx=j,
current_subpops=current_subpops,
current_best=self.current_best
)
context_vector = self.best_collaborator.build_context_vector(collaborators)
current_context_vectors[i].append(context_vector.copy())
fitness_value = self.fitness_function.evaluate(context_vector, self.data)
current_fitness[i].append(fitness_value)
# Update subpopulations and fitness
self.subpops = copy.deepcopy(current_subpops)
self.fitness = copy.deepcopy(current_fitness)
self.context_vectors = copy.deepcopy(current_context_vectors)
self.current_best = self._get_best_individuals(
subpops=self.subpops,
fitness=self.fitness,
context_vectors=self.context_vectors
)
best_context_vector, best_fitness = self._get_global_best()
if self.best_fitness < best_fitness:
stagnation_counter = 0
self.best_context_vector = best_context_vector.copy()
self.best_context_vectors.append(self.best_context_vector.copy())
self.best_fitness = best_fitness
else:
stagnation_counter += 1
if stagnation_counter >= self.conf["coevolution"]["max_gen_without_improvement"]:
break
n_gen += 1
The optimize
method demonstrates the typical CCEA workflow:
Decomposition and initialization: Decompose the problem and initialize subpopulations and optimizers.
Evolution loop: Repeat for a maximum number of generations or until stagnation.
Subpopulation evolution: Evolve each subpopulation independently.
Collaboration and evaluation: Collaborate across subpopulations to build complete solutions and evaluate fitness.
Best solution tracking: Update global best solution and check convergence.
By overriding these key _init_*
methods and optimize()
, you can flexibly customize the
co-evolutionary process to fit your problem domain, defining collaboration, evaluation,
initialization, optimization, decomposition, and the optimization flow itself.