7 Revolutionary Breakthroughs in Gene Network Mapping

BIO-INSIGHT workflow with gene network mapping

7 Revolutionary Breakthroughs in Gene Network Mapping (And 1 Costly Mistake to Avoid)

In the fast-evolving world of computational biology, one challenge has remained stubbornly complex: mapping gene regulatory networks (GRNs). These intricate systems control how genes turn on and off, shaping everything from cell development to disease progression. For years, scientists have struggled with inconsistent results, noisy data, and methods that prioritize mathematical elegance over biological reality.

But now, a groundbreaking new tool called BIO-INSIGHT is changing the game. Developed by researchers at the University of Málaga and Catholic University of Valencia, this open-source software leverages a parallel many-objective evolutionary algorithm to produce GRNs that are not only more accurate but also biologically meaningful.

In this article, we’ll explore the 7 revolutionary breakthroughs behind BIO-INSIGHT — and the one costly mistake many researchers still make when inferring gene networks.


What Are Gene Regulatory Networks (GRNs)?

Gene regulatory networks are like the brain of a cell. They determine which genes are activated or silenced in response to environmental cues, developmental signals, or disease states. Understanding GRNs is essential for:

  • Identifying disease biomarkers
  • Discovering new drug targets
  • Advancing personalized medicine

However, inferring these networks from gene expression data is notoriously difficult. Traditional methods often produce conflicting results, even when analyzing the same dataset.

The Problem: Over 26 machine learning techniques exist for GRN inference — but they rarely agree. This inconsistency undermines confidence in their predictions.


The Consensus Crisis in GRN Inference

A 2025 study published in Computers in Biology and Medicine revealed a startling truth: even high-accuracy inference techniques produce vastly different networks. For example, in one test:

  • NONLINEARODES predicted a central regulatory hub.
  • GENIE3_ET ignored that hub entirely, favoring a decentralized structure.

This lack of consensus isn’t just an academic concern — it leads to false leads in drug discovery and missed biological insights.

💡 Key Insight: Relying on a single inference method is like navigating a maze with one map — you might get somewhere, but it won’t be the optimal path.


Introducing BIO-INSIGHT: The Game-Changer

BIO-INSIGHT (Biologically Informed Optimizer – INtegrating Software to Infer GRNs by Holistic Thinking) is a paradigm shift. Instead of choosing one method, it combines 26 state-of-the-art inference techniques into a unified consensus framework.

But here’s what makes it revolutionary: it doesn’t just average results. It uses a many-objective evolutionary algorithm to optimize for biological plausibility, not just statistical fit.

✅ Key Features:

  • Open-source & free (MIT license)
  • Built on Java + JMetal framework
  • Integrates 26+ inference tools (ARACNE, GENIE3, TIGRESS, etc.)
  • Parallel, asynchronous processing for large-scale networks
  • Python library available on PyPI: GENECI

7 Revolutionary Breakthroughs in BIO-INSIGHT

Let’s dive into the 7 game-changing innovations that make BIO-INSIGHT outperform all existing methods.


1. Six Biologically Driven Objectives (Not Just Two)

Most consensus tools optimize for basic network properties like degree distribution. BIO-INSIGHT goes much further, evaluating networks across six biologically grounded objectives:

OBJECTIVEBIOLOGICAL RELEVANCE
QualityPromotes high-confidence, coherent interactions
MotifsFavors known regulatory patterns (e.g., feedforward loops)
Eigenvector DistributionEnsures influence follows power-law (hub genes matter)
Reduce Non-Essential InteractionsRemoves redundant edges via edge betweenness
Degree DistributionEnforces scale-free topology
DynamicityTests network stability under ODE simulations

This multi-faceted approach ensures networks are not just accurate, but biologically realistic.


2. Eigenvector Centrality: The Hidden Influence Metric

While most tools focus on node degree (number of connections), BIO-INSIGHT uses eigenvector centrality to identify genes that are influential because they connect to other influential genes.

This is crucial because in real GRNs, a gene with few connections can still be a master regulator if it controls key hubs.

The eigenvector score for node i is computed as:

$$\text{Ax}=\text{λx}$$

where A is the adjacency matrix, x is the eigenvector, and λ is the largest eigenvalue.

BIO-INSIGHT tests whether these scores follow a Pareto (power-law) distribution, a hallmark of biological networks.


3. Dynamic Stability via ODE Modeling

A network might look good on paper — but is it stable? BIO-INSIGHT simulates gene activity over time using a system of nonlinear ordinary differential equations (ODEs):

$$ \frac{d\,y_i}{dt} = f\Bigg(\sum_{j} A_{ji}\,y_j\Bigg) – y_i $$

where:

  • yi(t) = expression level of gene i
  • Aji = regulatory weight from gene j to i
  • f(x) = Hill function modeling nonlinear regulation

The Hill function is defined as:

\[ f(x) = \frac{x^n}{k^n + x^n} \]

with n=2 , k=0.5 — parameters chosen for biological plausibility.

Networks that quickly converge to a stable state score higher, filtering out chaotic or unstable predictions.


4. Edge Betweenness to Eliminate Redundancy

Many inferred networks are overly dense, filled with redundant interactions that obscure true regulatory signals.

BIO-INSIGHT uses edge betweenness centrality to penalize low-impact edges:

$$CB(e) = \sum_{s \neq t} \frac{\sigma_{st}(e)}{\sigma_{st}}$$

where σst is the number of shortest paths from s to t , and σst (e) is the number passing through edge e .

Edges with low betweenness are likely redundant. BIO-INSIGHT favors networks where even the weakest edges have high centrality — a sign of functional necessity.


5. Asynchronous Parallel Evolution: Speed Without Sacrifice

With six objectives, the computational cost could be crippling. But BIO-INSIGHT uses an asynchronous, parallel evolutionary model that:

  • Allows simultaneous evaluation across generations
  • Implements caching to avoid redundant calculations
  • Uses adaptive rounding in early evolution stages
  • Eliminates intermediate conversions

Result? It handles networks up to 2,000 genes — despite operating in a twice-larger objective space than previous tools.


6. Proven Superiority: +30% Accuracy Gain

BIO-INSIGHT was tested on a benchmark of 106 gene networks from DREAM challenges, SynTReN, GeneNetWeaver, and real databases (BioGRID, RegulonDB).

It was compared against:

  • MO-GENECI (previous state-of-the-art)
  • Mean, median, weighted mean, Bayesian fusion
  • 26 individual inference methods

Results? BIO-INSIGHT dominated in both AUROC and AUPR metrics.

METHODAUPR RANKAUROC RANK
BEST_BIO-INSIGHT1.521.70
BEST_MO-GENECI2.842.94
Bayesian Fusion5.585.77

Statistical testing (Friedman + Holm) confirmed p-values < 1e-20 — meaning the improvement is overwhelmingly significant.


7. Real-World Clinical Impact: Fibromyalgia & ME/CFS

BIO-INSIGHT isn’t just for academic benchmarks. It was applied to real gene expression data from patients with:

  • Myalgic Encephalomyelitis (ME/CFS)
  • Fibromyalgia (FM)
  • Co-diagnosis (ME/CFS + FM)

The results? It identified condition-specific regulatory interactions with potential biomarker value.

For example:

  • CD74–EIF4G2 interaction was uniquely predicted in ME/CFS
  • Both genes are involved in immune regulation and protein synthesis
  • CD74 is differentially expressed in ME/CFS patients
  • Their physical interaction is experimentally confirmed (Affinity Capture-MS)

This proves BIO-INSIGHT can uncover biologically relevant, clinically actionable insights — even in complex, poorly understood diseases.


The 1 Costly Mistake You Must Avoid

Despite these advances, many researchers still make a critical error:

They rely on individual inference methods or simplistic consensus (like averaging) without biological guidance.

Why is this a mistake?

  • Mathematical methods ignore biology — they may produce “accurate” networks that lack functional relevance.
  • No trade-off balancing — conflicting objectives (e.g., stability vs. complexity) go unmanaged.
  • Overfitting to specific datasets — techniques that work on DREAM challenges fail on real data.

BIO-INSIGHT avoids this by optimizing for biological coherence, not just predictive accuracy.


How BIO-INSIGHT Works: Step-by-Step

  1. Input: Gene expression data (e.g., RNA-seq, microarray)
  2. Phase 1: Run 26 inference techniques in parallel (Docker containers)
  3. Phase 2: Feed results into the evolutionary algorithm
  4. Phase 3: Optimize a weight vector for each technique
  5. Phase 4: Build consensus network via weighted voting
  6. Phase 5: Evaluate against 6 biological objectives
  7. Output: Pareto front of optimal networks

Users can then select the most biologically plausible network using interactive visualizations.


Validation: Does It Really Work?

The study answered 6 key research questions — all with a resounding “yes”:

RQQUESTIONANSWER
1Is there enough disparity to justify consensus?✅ Yes — networks differ wildly
2Do objectives conflict (ensuring trade-offs)?✅ Yes — proper evolutionary balance
3Do accurate techniques get higher weights?❌ No — holistic view wins
4Are networks biologically coherent?✅ Yes — scale-free, modular, motif-rich
5Is biological optimization aligned with accuracy?✅ Yes — higher optimization = higher AUPR/AUROC
6Is BIO-INSIGHT redundant with individual methods?❌ No — it adds novel value

This rigorous validation confirms BIO-INSIGHT isn’t just another tool — it’s a scientific advancement.


How to Use BIO-INSIGHT (Free & Easy)

You don’t need to be a bioinformatics expert. Here’s how to get started:

# Install via PyPI
pip install GENECI==3.0.1

# Run BIO-INSIGHT
bio-insight run --data expression_data.csv --output results/

Or access the source code on GitHub: 👉 https://github.com/AdrianSeguraOrtiz/BIO-INSIGHT

Includes:

  • Docker support
  • Interactive visualizations
  • Benchmark generators
  • Pre-configured datasets

The Future of GRN Inference

BIO-INSIGHT is just the beginning. Future plans include:

  • Integration of deep learning models
  • Clustering-based optimization to reduce redundancy
  • Support for single-cell RNA-seq
  • Dynamic encoding beyond simple weight vectors

As the authors note: “The current encoding may be too simplistic for the increasing biological complexity.”


If you’re Interested in Medical Image Segmentation, you may also find this article helpful: 7 Revolutionary Breakthroughs in Thyroid Cancer AI: How DualSwinUnet++ Outperforms Old Models

Final Call to Action

If you’re working on gene network inference, stop relying on outdated, inconsistent methods. The future is biologically guided, multi-objective consensus.

👉 Download BIO-INSIGHT today and see the difference for yourself:

Join the revolution in systems biology — where accuracy meets biological truth.

Below is the Python code that encapsulates the core logic of the BIO-INSIGHT algorithm, including the asynchronous-like parallel evolutionary process and the six biologically-inspired objective functions.

# BIO-INSIGHT: Biologically Informed Optimizer INtegrating Software to Infer GRNs by Holistic Thinking
# This script is a Python implementation of the model described in the paper:
# "Multifaceted evolution focused on maximal exploitation of domain knowledge for the consensus inference of Gene Regulatory Networks"
# by Adrián Segura-Ortiz, et al. (Computers in Biology and Medicine, 2025)

import numpy as np
import networkx as nx
from scipy.integrate import solve_ivp
from scipy.stats import kstest, powerlaw
import random
import multiprocessing
from functools import lru_cache

# --- Helper Functions & Classes ---

class Individual:
    """Represents an individual in the evolutionary algorithm population."""
    def __init__(self, weights):
        if not np.isclose(np.sum(weights), 1.0):
            raise ValueError("Weights must sum to 1.")
        self.weights = np.array(weights)
        self.fitness = {} # Dictionary to store fitness for each objective

    def __repr__(self):
        return f"Individual(weights={self.weights.round(3)}, fitness={self.fitness})"

def build_consensus_network(individual, confidence_matrices):
    """
    Builds a weighted consensus network from an individual's weights.
    Args:
        individual (Individual): The individual with weights for each technique.
        confidence_matrices (dict): A dictionary where keys are technique names
                                    and values are their confidence matrices (NxN numpy arrays).
    Returns:
        np.ndarray: The resulting weighted adjacency matrix for the consensus network.
    """
    num_genes = list(confidence_matrices.values())[0].shape[0]
    consensus_matrix = np.zeros((num_genes, num_genes))
    
    for i, tech_name in enumerate(confidence_matrices.keys()):
        consensus_matrix += individual.weights[i] * confidence_matrices[tech_name]
        
    return consensus_matrix

# --- Objective Functions ---

# Objective 1: Quality
def calculate_quality(consensus_matrix, confidence_matrices):
    """
    Promotes interactions with high confidence from coherent techniques.
    Refactored from the description in the paper.
    Lower is better.
    """
    num_genes = consensus_matrix.shape[0]
    if num_genes == 0: return 1.0

    quality_scores = []
    
    # Create a stack of confidence matrices for easier vectorized operations
    stacked_matrices = np.stack(list(confidence_matrices.values()), axis=-1)

    for i in range(num_genes):
        for j in range(num_genes):
            if i == j: continue
            
            confidence_values = stacked_matrices[i, j, :]
            
            # Penalize discrepancies from the median
            median_conf = np.median(confidence_values)
            distance_penalty = np.mean(np.abs(confidence_values - median_conf))
            
            # Interaction quality is the consensus confidence penalized by distance
            interaction_quality = consensus_matrix[i, j] * (1 - distance_penalty)
            quality_scores.append(interaction_quality)

    if not quality_scores: return 1.0
    
    mean_quality = np.mean(quality_scores)
    # The objective is to minimize 1 - mean_quality, which is equivalent to maximizing mean_quality
    return 1.0 - mean_quality

# Objective 2: Motifs
@lru_cache(maxsize=1024) # Cache results for binarized networks
def calculate_motifs(binarized_adj_matrix_tuple):
    """
    Favors networks with a high density of known biological motifs.
    Maximizes the count of predefined motifs.
    Higher is better.
    """
    binarized_adj_matrix = np.array(binarized_adj_matrix_tuple)
    G = nx.from_numpy_array(binarized_adj_matrix, create_using=nx.DiGraph)
    motif_count = 0
    
    # Simplified motif search for demonstration (e.g., Feed-Forward Loop)
    # A -> B, A -> C, B -> C
    for u in G.nodes():
        successors_u = set(G.successors(u))
        if len(successors_u) < 2: continue
        for v in successors_u:
            for w in successors_u:
                if v == w: continue
                if G.has_edge(v, w):
                    motif_count += 1
    
    # Normalize by the number of possible triangles to keep the value scaled
    num_nodes = G.number_of_nodes()
    if num_nodes < 3: return 0
    
    # The paper mentions 4 motifs; this is a simplified example.
    # A more complete implementation would search for all specified motifs.
    normalization_factor = num_nodes * (num_nodes - 1) * (num_nodes - 2)
    return motif_count / normalization_factor if normalization_factor > 0 else 0

# Objective 3: Eigenvector Distribution
@lru_cache(maxsize=1024)
def calculate_eigenvector_distribution(adj_matrix_tuple):
    """
    Evaluates if the gene influence (eigenvector centrality) follows a power-law distribution.
    Uses a goodness-of-fit test for a Pareto (power-law) distribution.
    Higher p-value (closer to 1) is better.
    """
    adj_matrix = np.array(adj_matrix_tuple)
    if np.sum(adj_matrix) == 0: return 0.0

    G = nx.from_numpy_array(adj_matrix, create_using=nx.DiGraph)
    
    try:
        # Use weight attribute for centrality calculation
        centrality = nx.eigenvector_centrality(G, weight='weight', max_iter=1000, tol=1e-04)
        centrality_values = np.array(list(centrality.values()))
        
        if len(centrality_values) < 2 or np.all(centrality_values == centrality_values[0]):
            return 0.0
        
        # Fit to a power-law distribution and perform a Kolmogorov-Smirnov test
        # A positive shift is added to ensure all values are > 0 for the power-law fit
        positive_values = centrality_values[centrality_values > 0]
        if len(positive_values) < 2: return 0.0

        # The paper uses a Pareto test; the powerlaw library provides a good approximation.
        fit = powerlaw.Fit(positive_values, discrete=False, verbose=False)
        return fit.power_law.p
    except (nx.PowerIterationFailedConvergence, nx.NetworkXError):
        return 0.0 # Return a poor score if centrality doesn't converge or graph is problematic

# Objective 4: Reduce Non-Essential Interactions
@lru_cache(maxsize=1024)
def reduce_non_essential_interactions(adj_matrix_tuple):
    """
    Penalizes networks with redundant interactions (low edge betweenness centrality).
    The goal is to maximize the centrality of even the least important edges.
    Higher is better.
    """
    adj_matrix = np.array(adj_matrix_tuple)
    if np.sum(adj_matrix) == 0: return 0.0
    
    # Invert weights for betweenness calculation, as it expects costs/distances
    G = nx.from_numpy_array(1 - adj_matrix, create_using=nx.DiGraph)
    
    try:
        # The paper mentions a custom Dijkstra-based implementation for efficiency.
        # Here we use networkx's implementation for simplicity.
        # The 'weight' parameter is crucial for weighted betweenness.
        betweenness = nx.edge_betweenness_centrality(G, weight='weight', normalized=True)
        if not betweenness: return 0.0
        
        betweenness_values = np.array(list(betweenness.values()))
        
        # The objective is 1 / weighted_mean of sorted centralities.
        # This is equivalent to maximizing the mean of the least central edges.
        # A simpler proxy is to return the mean value, which we want to maximize.
        return np.mean(betweenness_values)
    except nx.NetworkXError:
        return 0.0

# Objective 5: Degree Distribution
@lru_cache(maxsize=1024)
def calculate_degree_distribution(adj_matrix_tuple):
    """
    Promotes scale-free networks by testing if the degree distribution fits a power-law.
    Higher p-value (closer to 1) is better.
    """
    adj_matrix = np.array(adj_matrix_tuple)
    G = nx.from_numpy_array(adj_matrix, create_using=nx.DiGraph)
    
    degrees = [d for n, d in G.degree(weight='weight')]
    if len(degrees) < 2 or np.all(degrees == degrees[0]):
        return 0.0
        
    positive_degrees = [d for d in degrees if d > 0]
    if len(positive_degrees) < 2: return 0.0
    
    # Use the same power-law fit as in eigenvector distribution
    fit = powerlaw.Fit(positive_degrees, discrete=False, verbose=False)
    return fit.power_law.p

# Objective 6: Dynamicity
def ode_system(t, y, A, n=2, k=0.5):
    """
    System of nonlinear ODEs based on the Hill function to model gene regulation.
    """
    # Hill function for activation
    activation = (y**n) / (k**n + y**n)
    
    # Regulatory interactions based on the adjacency matrix A
    regulation_term = A.T @ activation
    
    # Degradation term
    degradation_term = y
    
    dydt = regulation_term - degradation_term
    return dydt

@lru_cache(maxsize=1024)
def calculate_dynamicity(adj_matrix_tuple):
    """
    Evaluates the dynamic stability of the network. A stable system will converge.
    We measure stability as 1 minus the average deviation from the initial state.
    Higher is better (closer to 1).
    """
    adj_matrix = np.array(adj_matrix_tuple)
    num_genes = adj_matrix.shape[0]
    if num_genes == 0: return 0.0

    # Initial state: all genes expressed at a baseline level
    y0 = np.ones(num_genes)
    
    # Time span for the simulation
    t_span = [0, 50]
    
    try:
        # Solve the ODE system
        sol = solve_ivp(
            ode_system,
            t_span,
            y0,
            args=(adj_matrix,),
            method='DOP853', # Similar to Dormand-Prince 5(4)
            dense_output=True
        )
        
        # Get the final state of the system
        final_state = sol.y[:, -1]
        
        # Calculate the average deviation from the initial state (1.0)
        # A stable system should not diverge uncontrollably.
        avg_deviation = np.mean(np.abs(final_state - y0))
        
        # The score is higher for smaller deviations (more stable systems)
        stability_score = 1.0 / (1.0 + avg_deviation)
        return stability_score

    except Exception:
        return 0.0 # Return a poor score if the simulation fails

# --- Evaluation Function ---

def evaluate_individual(individual, confidence_matrices, binarization_threshold=0.5):
    """
    Evaluates an individual against all six objective functions.
    """
    # 1. Build the consensus network
    consensus_matrix = build_consensus_network(individual, confidence_matrices)
    
    # For caching, convert numpy arrays to tuples of tuples
    consensus_tuple = tuple(map(tuple, consensus_matrix))

    # Binarize for motif calculation
    binarized_matrix = (consensus_matrix > binarization_threshold).astype(int)
    binarized_tuple = tuple(map(tuple, binarized_matrix))

    # 2. Calculate fitness for each objective
    # Note on optimization direction:
    # Quality: Minimize (so we use 1 - score) -> Handled inside function
    # Motifs: Maximize
    # Eigenvector Dist: Maximize
    # Reduce Non-Essential: Maximize
    # Degree Dist: Maximize
    # Dynamicity: Maximize
    
    fitness = {}
    fitness['quality'] = calculate_quality(consensus_matrix, confidence_matrices)
    fitness['motifs'] = calculate_motifs(binarized_tuple)
    fitness['eigen_dist'] = calculate_eigenvector_distribution(consensus_tuple)
    fitness['reduce_non_essential'] = reduce_non_essential_interactions(consensus_tuple)
    fitness['degree_dist'] = calculate_degree_distribution(consensus_tuple)
    fitness['dynamicity'] = calculate_dynamicity(consensus_tuple)
    
    individual.fitness = fitness
    return tuple(fitness.values())


# --- Evolutionary Algorithm Components ---

def dominates(ind1, ind2):
    """Checks if individual 1 dominates individual 2."""
    # Assumes all objectives are to be maximized
    better_in_one = False
    for obj in ind1.fitness:
        if ind1.fitness[obj] < ind2.fitness[obj]:
            return False # ind1 is worse in at least one objective
        if ind1.fitness[obj] > ind2.fitness[obj]:
            better_in_one = True
    return better_in_one

def non_dominated_sort(population):
    """Performs non-dominated sorting and returns fronts."""
    fronts = [[]]
    for p in population:
        p.domination_count = 0
        p.dominated_solutions = []
        for q in population:
            if p is q: continue
            if dominates(p, q):
                p.dominated_solutions.append(q)
            elif dominates(q, p):
                p.domination_count += 1
        if p.domination_count == 0:
            fronts[0].append(p)
    
    i = 0
    while fronts[i]:
        next_front = []
        for p in fronts[i]:
            for q in p.dominated_solutions:
                q.domination_count -= 1
                if q.domination_count == 0:
                    next_front.append(q)
        i += 1
        fronts.append(next_front)
        
    return fronts[:-1] # Return all non-empty fronts

def crowding_distance(front):
    """Calculates crowding distance for individuals in a front."""
    if not front: return
    
    num_objectives = len(list(front[0].fitness.values()))
    for ind in front:
        ind.crowding_distance = 0
        
    for i, obj_key in enumerate(front[0].fitness.keys()):
        front.sort(key=lambda x: x.fitness[obj_key])
        front[0].crowding_distance = float('inf')
        front[-1].crowding_distance = float('inf')
        
        f_min = front[0].fitness[obj_key]
        f_max = front[-1].fitness[obj_key]
        
        if f_max == f_min: continue
            
        for j in range(1, len(front) - 1):
            front[j].crowding_distance += (front[j+1].fitness[obj_key] - front[j-1].fitness[obj_key]) / (f_max - f_min)

def selection(population, k):
    """Selects k individuals using binary tournament selection based on rank and crowding."""
    selected = []
    for _ in range(k):
        p1 = random.choice(population)
        p2 = random.choice(population)
        
        if p1.rank < p2.rank:
            selected.append(p1)
        elif p2.rank < p1.rank:
            selected.append(p2)
        elif p1.crowding_distance > p2.crowding_distance:
            selected.append(p1)
        else:
            selected.append(p2)
    return selected

def simplex_crossover(p1, p2, eta=2):
    """Simplex Crossover (SPX) for weight vectors that sum to 1."""
    n = len(p1.weights)
    c1_weights = np.zeros(n)
    c2_weights = np.zeros(n)
    
    # This is a simplified version. A full SPX is more complex.
    # Using a simple blend crossover and re-normalizing for demonstration.
    alpha = random.random()
    c1_weights = alpha * p1.weights + (1 - alpha) * p2.weights
    c2_weights = (1 - alpha) * p1.weights + alpha * p2.weights
    
    # Ensure weights sum to 1
    c1_weights /= np.sum(c1_weights)
    c2_weights /= np.sum(c2_weights)
    
    return Individual(c1_weights), Individual(c2_weights)

def polynomial_mutation(individual, eta=20, prob_mut=0.1):
    """Polynomial mutation adapted for simplex."""
    if random.random() > prob_mut:
        return
        
    weights = individual.weights.copy()
    n = len(weights)
    
    for i in range(n):
        if random.random() < 1.0 / n:
            delta = 0
            u = random.random()
            if u < 0.5:
                delta = (2 * u)**(1.0 / (eta + 1)) - 1
            else:
                delta = 1 - (2 * (1 - u))**(1.0 / (eta + 1))
            
            weights[i] += delta
            
    # Project back to the simplex (ensure non-negative and sum to 1)
    weights[weights < 0] = 0
    if np.sum(weights) > 0:
        weights /= np.sum(weights)
    else: # In case all weights become 0
        weights = np.full(n, 1.0/n)
        
    individual.weights = weights


# --- Main BIO-INSIGHT Class ---

class BioInsight:
    def __init__(self, inference_techniques, pop_size=100, num_generations=50, num_cores=4):
        self.inference_techniques = inference_techniques
        self.num_techniques = len(inference_techniques)
        self.pop_size = pop_size
        self.num_generations = num_generations
        self.num_cores = num_cores
        self.population = []
        self.confidence_matrices = {}

    def _initialize_population(self):
        """Initializes the population with random weight vectors."""
        self.population = []
        for _ in range(self.pop_size):
            weights = np.random.rand(self.num_techniques)
            weights /= np.sum(weights)
            self.population.append(Individual(weights))

    def _run_inference_techniques(self, expression_data):
        """
        Placeholder for running the 26 inference techniques.
        In a real scenario, this would execute external tools (e.g., in Docker containers).
        Here, we simulate their output with random confidence matrices.
        """
        print("Simulating execution of individual inference techniques...")
        num_genes = expression_data.shape[0]
        for tech in self.inference_techniques:
            # Simulate output: a matrix of confidence scores between 0 and 1
            self.confidence_matrices[tech] = np.random.rand(num_genes, num_genes)
            np.fill_diagonal(self.confidence_matrices[tech], 0) # No self-regulation
        print("Individual inference finished.")

    def run(self, expression_data):
        """Main execution workflow for BIO-INSIGHT."""
        # 1. Get results from individual techniques
        self._run_inference_techniques(expression_data)
        
        # 2. Initialize population for the evolutionary algorithm
        self._initialize_population()
        
        # 3. Asynchronous-like parallel evaluation of the initial population
        print("Evaluating initial population...")
        with multiprocessing.Pool(self.num_cores) as pool:
            # Create a list of arguments for the evaluation function
            eval_args = [(ind, self.confidence_matrices) for ind in self.population]
            # Map the function to the arguments
            results = pool.starmap(evaluate_individual, eval_args)
        
        for i, ind in enumerate(self.population):
            ind.fitness = dict(zip(['quality', 'motifs', 'eigen_dist', 'reduce_non_essential', 'degree_dist', 'dynamicity'], results[i]))
        
        # 4. Main evolutionary loop (NSGA-II)
        for gen in range(self.num_generations):
            print(f"\n--- Generation {gen + 1}/{self.num_generations} ---")
            
            # Non-dominated sort and crowding distance
            fronts = non_dominated_sort(self.population)
            for i, front in enumerate(fronts):
                for ind in front:
                    ind.rank = i
                crowding_distance(front)
            
            # Selection
            parents = selection(self.population, self.pop_size)
            
            # Crossover and Mutation to create offspring
            offspring = []
            for i in range(0, self.pop_size, 2):
                p1 = parents[i]
                p2 = parents[i+1]
                c1, c2 = simplex_crossover(p1, p2)
                polynomial_mutation(c1)
                polynomial_mutation(c2)
                offspring.extend([c1, c2])
            
            # Evaluate offspring in parallel
            with multiprocessing.Pool(self.num_cores) as pool:
                eval_args = [(ind, self.confidence_matrices) for ind in offspring]
                results = pool.starmap(evaluate_individual, eval_args)
            
            for i, ind in enumerate(offspring):
                ind.fitness = dict(zip(['quality', 'motifs', 'eigen_dist', 'reduce_non_essential', 'degree_dist', 'dynamicity'], results[i]))

            # Combine parent and offspring population
            combined_pop = self.population + offspring
            
            # Select next generation
            fronts = non_dominated_sort(combined_pop)
            next_population = []
            for front in fronts:
                if len(next_population) + len(front) <= self.pop_size:
                    next_population.extend(front)
                else:
                    crowding_distance(front)
                    front.sort(key=lambda x: x.crowding_distance, reverse=True)
                    remaining_space = self.pop_size - len(next_population)
                    next_population.extend(front[:remaining_space])
                    break
            
            self.population = next_population
            
            # Log the best fitness from the first front
            best_in_front = max(self.population, key=lambda x: x.fitness['quality'])
            print(f"Sample best fitness (quality): {best_in_front.fitness['quality']:.4f}")

        # 5. Return the final Pareto front
        final_front = non_dominated_sort(self.population)[0]
        print(f"\nEvolution finished. Found {len(final_front)} solutions in the Pareto front.")
        return final_front


if __name__ == '__main__':
    # --- Example Usage ---
    
    # List of inference techniques (as mentioned in the paper)
    # A subset is used here for brevity.
    TECHNIQUES = [
        "ARACNE", "BC3NET", "C3NET", "CLR", "GENIE3_RF", "GRNBOOST2", 
        "MRNET", "PCIT", "TIGRESS", "PLSNET"
    ]
    
    # Simulate gene expression data (e.g., 50 genes, 100 samples)
    NUM_GENES = 50
    NUM_SAMPLES = 100
    mock_expression_data = np.random.rand(NUM_GENES, NUM_SAMPLES)
    
    # Instantiate and run BIO-INSIGHT
    bio_insight_model = BioInsight(
        inference_techniques=TECHNIQUES,
        pop_size=50,          # Smaller for quick demo
        num_generations=20,   # Smaller for quick demo
        num_cores=multiprocessing.cpu_count()
    )
    
    pareto_front_solutions = bio_insight_model.run(mock_expression_data)
    
    # Display one of the solutions from the Pareto front
    if pareto_front_solutions:
        print("\n--- Example Solution from Pareto Front ---")
        best_solution = pareto_front_solutions[0]
        print(f"Weights: {best_solution.weights.round(3)}")
        print("Fitness values:")
        for obj, val in best_solution.fitness.items():
            print(f"  - {obj}: {val:.4f}")

        # We can now build the final network from this solution
        final_network = build_consensus_network(best_solution, bio_insight_model.confidence_matrices)
        print(f"\nGenerated a {final_network.shape[0]}x{final_network.shape[0]} consensus network matrix.")

Leave a Comment

Your email address will not be published. Required fields are marked *

Follow by Email
Tiktok