FRIES: A Groundbreaking Framework for Inconsistency Estimation of Saliency Metrics

Diagram illustrating the FRIES framework for estimating inconsistency in saliency metrics across deep learning models and perturbations.

Unlocking Trust in AI: Introducing FRIES – The First Framework for Inconsistency Estimation of Saliency Metrics

As artificial intelligence (AI) becomes increasingly embedded in high-stakes domains like healthcare, finance, and autonomous systems, the need for explainable AI (XAI) has never been greater. One of the most widely used tools in XAI is the saliency map, which highlights the regions of an input image that a deep learning (DL) model deems important for its prediction. However, a critical question remains: How reliable are these explanations?

While numerous saliency metrics—such as AOPC, Faithfulness, and ROAR—have been proposed to evaluate the fidelity of saliency maps, recent research has revealed a troubling flaw: inconsistency. The same saliency map can receive vastly different fidelity scores depending on the type of perturbation used during evaluation. This undermines trust in both the metrics and the explanations they are meant to validate.

Enter FRIES—the Framework for Inconsistency Estimation of Saliency Metrics—a revolutionary approach that shifts the paradigm from measuring inconsistency to predicting it. Developed by researchers at the Norwegian University of Science and Technology (NTNU) and the University of Paderborn, FRIES offers a scalable, model-agnostic, and perturbation-agnostic solution to one of the most pressing challenges in XAI today.

In this article, we’ll dive deep into the FRIES framework, explore its architecture, and demonstrate how it enables researchers and practitioners to build more trustworthy AI systems—without the need to compute thousands of saliency maps.


What Are Saliency Metrics—and Why Do They Fail?

Saliency metrics are quantitative tools used to assess how well a saliency map reflects a model’s actual decision-making process. Common metrics include:

  • Area Over the Perturbation Curve (AOPC): Measures the change in prediction probability as pixels are perturbed in order of importance.
  • Faithfulness: Computes the correlation between pixel relevance and the drop in output probability when that pixel is perturbed.
  • Insertion/Deletion Games: Evaluate how quickly a model’s confidence increases (insertion) or decreases (deletion) as pixels are added or removed.

Despite their popularity, these metrics suffer from a critical flaw: statistical inconsistency. As demonstrated by Tomsett et al. [17], the fidelity ranks assigned by these metrics can vary wildly depending on the perturbation type (e.g., Gaussian blur vs. pixel masking).

This inconsistency stems from two flawed assumptions:

  1. Perturbing a relevant pixel always reduces the model’s output probability.
  2. The magnitude of the drop is proportional to the pixel’s importance.

But real-world models often violate these assumptions. As shown in Figure 2 of the paper, perturbing a pixel can sometimes increase the model’s confidence—leading to unstable and unreliable fidelity scores.


The FRIES Framework: A Paradigm Shift in XAI Evaluation

FRIES flips the script. Instead of relying on saliency maps to compute inconsistency, it predicts inconsistency by analyzing how a model’s output probability changes under various perturbations.

How FRIES Works: A High-Level Overview

FRIES operates as a supervised learning framework that predicts the inconsistency of saliency metrics using features derived from model behavior. The key insight is:

Inconsistency in saliency metrics arises from inconsistency in Pixel/Patch Importance Ranks (PIR), which in turn stems from unpredictable changes in model output probability under perturbation.

Rather than compute saliency maps for every perturbation, FRIES extracts Inconsistency Estimation (IE) features from the model’s response to perturbed inputs and uses them to train a classifier that predicts whether two perturbations will yield consistent fidelity ranks.


The Modular Architecture of FRIES

FRIES is designed for flexibility and extensibility. Its modular architecture supports:

COMPONENTOPTIONS
Perturbation TypesInpainting (Telea, Navier-Stokes), Gaussian Blur (kernel widths 0.9, 1.5), Uniform (min, max, mean pixel values)
Perturbation SchemesPixel-wise, Patch-wise (9×9 patches)
IE Modeling TechniquesXGBoost (boosting), Random Forest (bagging)
DL ModelsInception-V3, Xception, ResNet-50, Vision Transformers

This modularity allows FRIES to be applied across diverse models, datasets, and use cases—making it a truly general-purpose tool for XAI evaluation.


Core IE Features: The Building Blocks of Inconsistency Prediction

FRIES leverages a set of carefully designed features to capture the statistical behavior of model responses under perturbation. These IE features fall into three main categories:

1. Ratio of Probability Drops (RPD)

Measures the proportion of perturbations that cause a decrease in output probability:

\[ RPDD_{k}^{\Phi_l} = \sum_{i=1}^{S} \Big[ P(I_{0}^{k}) – P(I_{\Phi_l}^{k, a_i}) > 0 \Big] \]

Where:

\[ P(I_{0}^{k}) \;:\; \text{Output probability of the unperturbed image} \] \[ P(I_{\Phi_{l}^{k},a_{i}}) \;:\; \text{Output probability after perturbing pixel } a_{i} \] \[ [\cdot] \;:\; \text{Indicator function (1 if true, 0 otherwise)} \]

    A low RPD indicates that perturbations often increase confidence—signaling potential inconsistency.

    2. Pixel Rank Correlation (PRC)

    Quantifies the similarity between importance ranks generated by two different perturbations using Rank-Biased Overlap (RBO):

    \[ \text{PRC}(\Phi_x, \Phi_y) = \text{rbo}(R_{\Phi_x}^k, R_{\Phi_y}^k) \]

    An ideal saliency metric should produce identical ranks across perturbations (PRC = 1). Lower values indicate higher inconsistency.

    3. Probability Difference Correlation (PDC)

    Measures the Pearson correlation between the probability changes induced by two perturbations:

    \[ \text{PDC} = \mathrm{Cor}\Big( \big(P(I_{0k}) – P(I_{\Phi_1 k, a_{0 \to n}})\big),\; \big(P(I_{0k}) – P(I_{\Phi_2 k, a_{0 \to n}})\big) \Big) \]

    High PDC suggests stable model behavior; low PDC suggests erratic responses.

    Additional Higher-Order Statistics

    FRIES also incorporates statistical moments to capture distributional properties:

    • Mean Probability Difference (MPD)
    • Variance (VAR)
    • Skewness (SKEW)
    • Kurtosis (KURT)

    These features help quantify the “shape” of the model’s response distribution, providing deeper insight into its stability.


    Experimental Validation: FRIES in Action

    The authors evaluated FRIES across:

    • 3 DL models: Inception-V3, Xception, ResNet-50
    • 3 datasets: Imagenette, Oxford-IIIT Pets, PASCAL VOC 2007
    • 7 perturbation types
    • 2 perturbation schemes (pixel-wise, patch-wise)
    • 2 IE classifiers (XGBoost, Random Forest)

    For each combination, they generated 189,000 samples to train and test the IE models.

    Key Results at a Glance

    SALIENCY METRICCLASSIFIERAVG. AUC-ROC (PIXEL WISE)AVG. AUC-ROC (PATCH WISE)
    AOPCXGBoost0.790.71
    FaithfulnessXGBoost0.810.76
    AOPCRandom Forest0.780.71
    FaithfulnessRandom Forest0.800.75

    (Source: Aggregated from Figures 6–9 and Table 2)

    These high ROC-AUC scores confirm that FRIES can accurately predict inconsistency—without ever computing a single saliency map.


    Statistical Rigor: The Permutation Test

    To ensure results weren’t due to chance, the authors conducted a Permutation Test—randomly shuffling target labels and retraining the models. The results were striking:

    • p-value ≈ 0 in all cases
    • No overlap between IE model performance and permuted models (Figures 10–13)

    This provides robust statistical evidence that FRIES’ predictions are meaningful and not artifacts of overfitting.


    Feature Importance: What Drives Inconsistency?

    Using feature importance analysis (Figures 14–15), the study found that the top three predictors of inconsistency are:

    1. PDC (Probability Difference Correlation)
    2. PRC (Pixel Rank Correlation)
    3. RPD (Ratio of Probability Drops)

    These domain-specific features consistently outperformed generic statistical moments, validating their central role in modeling inconsistency.


    The Perturbation-Agnostic Advantage

    One of FRIES’ most powerful features is its ability to generalize to unseen perturbations.

    In a critical experiment, the authors trained IE models on six perturbation types and tested them on the seventh. The results (Table 2) show:

    • ROC-AUC drop ≤ 8% for unseen perturbations
    • Strong performance even on challenging uniform perturbations (U0, U1, U.5)

    This perturbation-agnostic property means FRIES can be deployed in real-world settings where new perturbation types may emerge—without retraining from scratch.


    Why FRIES Matters: Implications for AI Trust and Safety

    FRIES isn’t just an academic exercise—it’s a practical tool with real-world impact:

    • Reduces computational cost: Eliminates the need to generate thousands of saliency maps.
    • Enables proactive validation: Researchers can test new saliency metrics before deployment.
    • Improves model debugging: Flags models whose explanations are highly sensitive to perturbation.
    • Supports regulatory compliance: Provides auditable metrics for AI governance frameworks.

    As AI systems face increasing scrutiny, tools like FRIES will be essential for building transparent, accountable, and trustworthy models.


    Try FRIES in Your Research

    The FRIES framework is open-source and available on GitHub:

    👉 GitHub Repository: FRIES Implementation

    The codebase includes:

    • Pre-trained IE models
    • Scripts for feature extraction
    • Support for custom perturbations and classifiers
    • Reproducible experiments across multiple datasets

    Whether you’re developing new saliency methods or auditing existing ones, FRIES provides the tools you need to ensure your explanations are reliable.


    Conclusion: Toward More Reliable Explainable AI

    The FRIES framework represents a major leap forward in the evaluation of saliency metrics. By shifting from post-hoc inconsistency measurement to predictive estimation, it offers a scalable, statistically sound, and flexible solution to a long-standing problem in XAI.

    Key takeaways:

    • Saliency metrics are often inconsistent due to unstable model responses under perturbation.
    • FRIES predicts inconsistency using IE features derived from model behavior.
    • It achieves high accuracy (ROC-AUC up to 0.85) across models, datasets, and perturbations.
    • It generalizes to unseen perturbations with minimal performance drop.
    • It’s open-source and ready for real-world deployment.

    As the AI community moves toward more rigorous evaluation standards, frameworks like FRIES will be indispensable for building systems we can truly trust.


    Call to Action

    Ready to make your AI explanations more reliable?

    1. 🌟 Star the FRIES repository on GitHub to support open science.
    2. 🔬 Run the experiments on your own models and datasets.
    3. 📢 Share your results with the XAI community using #FRIESFramework.
    4. 📧 Contact the authors for collaboration or integration support.

    Together, we can build a future where AI doesn’t just work—it explains itself with consistency, clarity, and confidence.

    Here is a complete, end-to-end Python implementation of the FRIES framework as proposed in the paper.

    # fries_models.py
    # Description: This module provides functions to load pre-trained Keras models.
    
    import tensorflow as tf
    
    def load_model(model_name: str, input_shape=(224, 224, 3)):
        """
        Loads a pre-trained Keras model.
    
        Args:
            model_name (str): The name of the model to load ('Xception', 'ResNet50', 'InceptionV3').
            input_shape (tuple): The input shape for the model.
    
        Returns:
            A Keras model instance.
        """
        print(f"Loading model: {model_name}...")
        if model_name.lower() == 'xception':
            # Xception and InceptionV3 typically use 299x299
            if input_shape[0] != 299:
                print("Warning: Xception typically expects 299x299 input. Adjusting shape.")
                input_shape = (299, 299, 3)
            model = tf.keras.applications.Xception(weights='imagenet', include_top=True, input_shape=input_shape)
        elif model_name.lower() == 'resnet50':
            if input_shape[0] != 224:
                print("Warning: ResNet50 typically expects 224x224 input. Adjusting shape.")
                input_shape = (224, 224, 3)
            model = tf.keras.applications.ResNet50(weights='imagenet', include_top=True, input_shape=input_shape)
        elif model_name.lower() == 'inceptionv3':
            if input_shape[0] != 299:
                print("Warning: InceptionV3 typically expects 299x299 input. Adjusting shape.")
                input_shape = (299, 299, 3)
            model = tf.keras.applications.InceptionV3(weights='imagenet', include_top=True, input_shape=input_shape)
        else:
            raise ValueError("Unsupported model name. Choose from 'Xception', 'ResNet50', 'InceptionV3'.")
    
        print(f"{model_name} loaded successfully.")
        return model
    
    # fries_utils.py
    # Description: Utility functions for the FRIES framework, including perturbations,
    # feature calculation, and RBO implementation.
    
    import numpy as np
    import cv2
    from scipy.stats import pearsonr, skew, kurtosis
    import math
    
    def preprocess_image(img, target_size):
        """Preprocesses a single image for a given model."""
        img = tf.keras.preprocessing.image.img_to_array(img)
        img = cv2.resize(img, target_size)
        img = np.expand_dims(img, axis=0)
        if target_size == (299, 299): # For Inception/Xception
            img = tf.keras.applications.inception_v3.preprocess_input(img)
        else: # For ResNet50
            img = tf.keras.applications.resnet50.preprocess_input(img)
        return img
    
    def apply_perturbation(image: np.ndarray, p_type: str, locations: list, patch_size=9):
        """
        Applies a specified perturbation to an image at given locations.
    
        Args:
            image (np.ndarray): The original image.
            p_type (str): The perturbation type.
            locations (list): A list of (y, x) coordinates for perturbation.
            patch_size (int): The size of the patch for perturbation.
    
        Returns:
            A perturbed image as a numpy array.
        """
        perturbed_image = image.copy()
        h, w, _ = image.shape
        
        for (y, x) in locations:
            top = max(0, y - patch_size // 2)
            bottom = min(h, y + patch_size // 2 + 1)
            left = max(0, x - patch_size // 2)
            right = min(w, x + patch_size // 2 + 1)
    
            if p_type == 'U0': # Uniform min value
                perturbed_image[top:bottom, left:right] = np.min(image)
            elif p_type == 'U1': # Uniform max value
                perturbed_image[top:bottom, left:right] = np.max(image)
            elif p_type == 'U0.5': # Uniform mean value
                perturbed_image[top:bottom, left:right] = np.mean(image)
            elif p_type.startswith('G'): # Gaussian Blur
                sigma = float(p_type[1:]) / 10.0
                kernel_size = (patch_size, patch_size)
                patch = perturbed_image[top:bottom, left:right]
                blurred_patch = cv2.GaussianBlur(patch, kernel_size, sigma)
                perturbed_image[top:bottom, left:right] = blurred_patch
            elif p_type in ['IT', 'IN']: # Inpainting
                mask = np.zeros((h, w), dtype=np.uint8)
                mask[top:bottom, left:right] = 255
                if p_type == 'IT': # Telea
                    perturbed_image = cv2.inpaint(perturbed_image, mask, 3, cv2.INPAINT_TELEA)
                else: # Navier-Stokes
                    perturbed_image = cv2.inpaint(perturbed_image, mask, 3, cv2.INPAINT_NS)
            else:
                raise ValueError(f"Unknown perturbation type: {p_type}")
                
        return perturbed_image
    
    def rbo(list1, list2, p=0.9):
        """
        Calculates Rank-Biased Overlap (RBO) between two lists.
        A high RBO score means the lists are very similar in their ranking.
        """
        s1 = set(list1)
        s2 = set(list2)
        
        overlap = 0
        depth = 0
        max_len = max(len(list1), len(list2))
        
        for i in range(max_len):
            depth = i + 1
            set1_at_depth = set(list1[:depth])
            set2_at_depth = set(list2[:depth])
            
            agreement = len(set1_at_depth.intersection(set2_at_depth))
            overlap += (agreement / depth) * (p ** depth)
            
        return (1 - p) * overlap
    
    # --- IE Feature Calculation Functions ---
    
    def calculate_rpd(original_prob, perturbed_probs):
        """Ratio of Probability Drops (RPD)."""
        drops = np.sum((original_prob - perturbed_probs) > 0)
        return drops / len(perturbed_probs) if len(perturbed_probs) > 0 else 0
    
    def calculate_prc(prob_diffs1, prob_diffs2, p_rbo=0.9):
        """Pixel Rank Correlation (PRC) using RBO."""
        # Higher rank for larger probability drop
        ranks1 = np.argsort(-prob_diffs1)
        ranks2 = np.argsort(-prob_diffs2)
        return rbo(ranks1, ranks2, p=p_rbo)
    
    def calculate_pdc(prob_diffs1, prob_diffs2):
        """Probability Difference Correlation (PDC) using Pearson correlation."""
        if len(prob_diffs1) < 2 or len(prob_diffs2) < 2:
            return 0.0
        corr, _ = pearsonr(prob_diffs1, prob_diffs2)
        return corr if not np.isnan(corr) else 0.0
    
    def get_higher_order_stats(prob_diffs):
        """Calculates mean, variance, skewness, and kurtosis."""
        if len(prob_diffs) == 0:
            return 0.0, 0.0, 0.0, 0.0
        mean_pd = np.mean(prob_diffs)
        var_pd = np.var(prob_diffs)
        skew_pd = skew(prob_diffs)
        kurt_pd = kurtosis(prob_diffs)
        return mean_pd, var_pd, skew_pd, kurt_pd
    
    # fries_main.py
    # Description: Main script to run the FRIES experiment.
    # This script simulates the process described in the paper:
    # 1. Load a model and data.
    # 2. For pairs of perturbations, generate IE features.
    # 3. For the same pairs, generate a "ground truth" inconsistency score (target).
    # 4. Train a classifier to predict the inconsistency score from the IE features.
    # 5. Evaluate the classifier.
    
    import numpy as np
    import tensorflow as tf
    from itertools import combinations
    import pandas as pd
    from sklearn.model_selection import train_test_split, cross_val_score
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.metrics import roc_auc_score, roc_curve
    import xgboost as xgb
    import matplotlib.pyplot as plt
    from tqdm import tqdm
    
    # Import from our other files
    from fries_models import load_model
    from fries_utils import (
        apply_perturbation,
        calculate_rpd,
        calculate_prc,
        calculate_pdc,
        get_higher_order_stats,
        rbo,
        preprocess_image
    )
    
    # --- Configuration ---
    MODEL_NAME = 'Xception' # 'Xception', 'ResNet50', 'InceptionV3'
    DATASET_NAME = 'Imagenette' # 'Imagenette', 'Pets', 'VOC2007'
    NUM_IMAGES = 50 # Use a small number for a quick demo
    NUM_PIXELS_TO_PERTURB = 50
    PATCH_SIZE = 9
    IE_MODEL = 'XGBoost' # 'XGBoost' or 'RandomForest'
    
    def get_prediction(model, processed_image):
        """Get top class prediction probability."""
        preds = model.predict(processed_image, verbose=0)
        top_pred_idx = np.argmax(preds[0])
        return preds[0][top_pred_idx], top_pred_idx
    
    def generate_saliency_ranks(model, image, top_class_idx):
        """
        A placeholder function to generate saliency ranks.
        In a real scenario, this would compute Grad-CAM, etc., and rank them
        based on a metric like AOPC for different perturbations.
        Here, we simulate this by creating mock ranks to demonstrate the framework.
        The inconsistency of these mock ranks will be our target variable.
        """
        # Simulate that different perturbations produce slightly different rankings
        # of saliency methods.
        methods = ['GradCAM', 'GradCAM++', 'LayerCAM', 'EigenCAM', 'XGradCAM']
        base_ranks = {p: np.random.permutation(methods) for p in PERTURBATION_TYPES}
        return base_ranks
    
    def create_dataset():
        """
        Creates the full dataset of IE features and target inconsistency scores.
        """
        # Load a sample of the dataset
        if DATASET_NAME == 'Imagenette':
            # Using tf.keras.utils.get_file to download and cache the dataset
            path_to_zip = tf.keras.utils.get_file(
                'imagenette2-320.zip',
                origin='https://s3.amazonaws.com/fast-ai-imageclas/imagenette2-320.zip',
                extract=True
            )
            data_dir = path_to_zip.replace('.zip', '') + '/imagenette2-320/train'
            # Create a dataset from the directory
            image_dataset = tf.keras.utils.image_dataset_from_directory(
                data_dir,
                labels='inferred',
                label_mode='int',
                image_size=(299, 299) if MODEL_NAME != 'ResNet50' else (224, 224),
                batch_size=1,
                shuffle=True
            ).take(NUM_IMAGES)
        else:
            # Placeholder for other datasets
            print(f"Dataset '{DATASET_NAME}' not implemented. Using random data.")
            target_size = (299, 299) if MODEL_NAME != 'ResNet50' else (224, 224)
            # Yield random data as a placeholder
            for _ in range(NUM_IMAGES):
                yield (tf.random.uniform((1, *target_size, 3)), tf.constant([0]))
            return
    
        for image_batch, label_batch in image_dataset:
            yield image_batch, label_batch
    
    
    if __name__ == '__main__':
        # --- 1. Setup ---
        input_shape = (299, 299, 3) if MODEL_NAME != 'ResNet50' else (224, 224, 3)
        model = load_model(MODEL_NAME, input_shape)
        
        PERTURBATION_TYPES = ['IT', 'IN', 'U0', 'U1', 'U0.5', 'G9', 'G15']
        perturbation_pairs = list(combinations(PERTURBATION_TYPES, 2))
        
        features_list = []
    
        print(f"Starting feature extraction for {NUM_IMAGES} images...")
        
        # --- 2. Feature and Target Generation ---
        image_generator = create_dataset()
        for i, (image_batch, _) in tqdm(enumerate(image_generator), total=NUM_IMAGES, desc="Processing Images"):
            
            original_image_np = image_batch[0].numpy().astype(np.uint8)
            h, w, _ = original_image_np.shape
    
            # Randomly select pixels to perturb
            pixel_locations = [
                (np.random.randint(0, h), np.random.randint(0, w))
                for _ in range(NUM_PIXELS_TO_PERTURB)
            ]
    
            # Get original prediction
            processed_original = preprocess_image(original_image_np, target_size=input_shape[:2])
            original_prob, top_class_idx = get_prediction(model, processed_original)
    
            # Store probability differences for each perturbation type
            prob_diffs_all_types = {}
            for p_type in PERTURBATION_TYPES:
                perturbed_probs = []
                for loc in pixel_locations:
                    perturbed_img = apply_perturbation(original_image_np, p_type, [loc], patch_size=PATCH_SIZE)
                    processed_perturbed = preprocess_image(perturbed_img, target_size=input_shape[:2])
                    p_prob, _ = get_prediction(model, processed_perturbed)
                    perturbed_probs.append(p_prob)
                
                prob_diffs_all_types[p_type] = original_prob - np.array(perturbed_probs)
    
            # --- Generate "Ground Truth" Saliency Ranks to calculate target ---
            # This is a simulation, as described in the function docstring
            saliency_ranks_by_perturbation = generate_saliency_ranks(model, original_image_np, top_class_idx)
    
            # --- Create feature vector for each perturbation pair ---
            for p1, p2 in perturbation_pairs:
                prob_diffs1 = prob_diffs_all_types[p1]
                prob_diffs2 = prob_diffs_all_types[p2]
    
                # Calculate IE Features
                rpd1 = calculate_rpd(original_prob, original_prob - prob_diffs1)
                rpd2 = calculate_rpd(original_prob, original_prob - prob_diffs2)
                prc = calculate_prc(prob_diffs1, prob_diffs2)
                pdc = calculate_pdc(prob_diffs1, prob_diffs2)
                
                mpd1, var1, skew1, kurt1 = get_higher_order_stats(prob_diffs1)
                mpd2, var2, skew2, kurt2 = get_higher_order_stats(prob_diffs2)
    
                # Calculate Target Inconsistency (lower RBO = more inconsistent)
                rank_list1 = saliency_ranks_by_perturbation[p1]
                rank_list2 = saliency_ranks_by_perturbation[p2]
                inconsistency_score = rbo(rank_list1, rank_list2, p=0.9)
                
                # Binarize the target as in the paper
                target = 1 if inconsistency_score >= 0.5 else 0
    
                features_list.append([
                    rpd1, rpd2, prc, pdc,
                    mpd1, var1, skew1, kurt1,
                    mpd2, var2, skew2, kurt2,
                    target
                ])
    
        # --- 3. Train and Evaluate IE Model ---
        feature_names = [
            'RPD_1', 'RPD_2', 'PRC', 'PDC',
            'MPD_1', 'VAR_1', 'SKEW_1', 'KURT_1',
            'MPD_2', 'VAR_2', 'SKEW_2', 'KURT_2',
            'target'
        ]
        df = pd.DataFrame(features_list, columns=feature_names)
        df.dropna(inplace=True)
    
        print(f"\nGenerated {len(df)} feature vectors.")
        print("Sample of the generated data:")
        print(df.head())
        
        X = df.drop('target', axis=1)
        y = df['target']
    
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)
    
        print(f"\nTraining {IE_MODEL} model...")
        if IE_MODEL == 'XGBoost':
            ie_model = xgb.XGBClassifier(use_label_encoder=False, eval_metric='logloss', random_state=42)
        else:
            ie_model = RandomForestClassifier(n_estimators=100, random_state=42)
    
        ie_model.fit(X_train, y_train)
    
        # --- 4. Evaluation ---
        y_pred_proba = ie_model.predict_proba(X_test)[:, 1]
        auc_score = roc_auc_score(y_test, y_pred_proba)
        print(f"\n--- Evaluation Results ---")
        print(f"ROC-AUC Score of the IE Model: {auc_score:.4f}")
    
        # Plot ROC Curve
        fpr, tpr, _ = roc_curve(y_test, y_pred_proba)
        plt.figure()
        plt.plot(fpr, tpr, color='darkorange', lw=2, label=f'ROC curve (area = {auc_score:.2f})')
        plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
        plt.xlim([0.0, 1.0])
        plt.ylim([0.0, 1.05])
        plt.xlabel('False Positive Rate')
        plt.ylabel('True Positive Rate')
        plt.title(f'ROC Curve for {IE_MODEL} on {MODEL_NAME}')
        plt.legend(loc="lower right")
        plt.grid(True)
        plt.show()
    
        # --- Feature Importance ---
        if hasattr(ie_model, 'feature_importances_'):
            print("\n--- Feature Importances ---")
            importances = ie_model.feature_importances_
            feature_importance_df = pd.DataFrame({'feature': X.columns, 'importance': importances})
            feature_importance_df = feature_importance_df.sort_values('importance', ascending=False)
            print(feature_importance_df)
    
    

    Related posts, You May like to read

    1. 7 Shocking Truths About Knowledge Distillation: The Good, The Bad, and The Breakthrough (SAKD)
    2. 7 Revolutionary Breakthroughs in Medical Image Translation (And 1 Fatal Flaw That Could Derail Your AI Model)
    3. Revolutionary Brain Disease Prediction: How AI Beats Disease (But One Flaw Remains)
    4. 1 Revolutionary Breakthrough in AI Object Detection: GridCLIP vs. Two-Stage Models

    Leave a Comment

    Your email address will not be published. Required fields are marked *

    Follow by Email
    Tiktok