Unlock 5.7% Higher Accuracy: How KD-FixMatch Crushes Noisy Labels in Semi-Supervised Learning (And Why FixMatch Falls Short)

Imagine training cutting-edge AI models with only fractions of the labeled data you thought you needed. This isn’t fantasy—it’s the promise of Semi-Supervised Learning (SSL). But a hidden enemy sabotages results: noisy pseudo-labels. Traditional methods like FixMatch stumble early when imperfect teacher models flood training with errors. The consequence? Stunted performance, wasted compute, and missed opportunities.

Enter KD-FixMatch—a revolutionary approach from Amazon researchers that slashes noise and unlocks unprecedented accuracy. This isn’t an incremental tweak; it’s a fundamental shift leveraging knowledge distillation and sequential training to turn SSL’s weakness into strength. By the end of this guide, you’ll understand exactly how it works and why it outperforms FixMatch every single time in rigorous benchmarks.

Why Noisy Pseudo-Labels Are SSL’s Silent Killer (And FixMatch’s Fatal Flaw)

Semi-supervised learning thrives on using small labeled datasets alongside vast pools of unlabeled data. The dominant approach? Consistency regularization and pseudo-labeling. Here’s the critical pain point:

The Teacher-Student Trap: Methods like FixMatch use a Siamese Neural Network (SNN) with weight-sharing “teacher” and “student” models trained simultaneously.
Early-Stage Disaster: In the critical early training phase, the immature teacher generates disastrously noisy pseudo-labels.
Poisoned Training: The student learns from these errors, propagating inaccuracies and limiting final model performance. Recovery is difficult.

The Proof is in the Paper: Experiments across SVHN, CIFAR-10, CIFAR-100, and FOOD101 showed FixMatch always lagged behind KD-FixMatch—especially with very few labels (Table 2).

KD-FixMatch Revealed: Your 3-Step Shield Against Noise

KD-FixMatch isn’t just different—it’s smarter. It replaces simultaneous training with a sequential knowledge distillation pipeline, creating a robust “trusted” dataset. Here’s the breakdown:

Step 1: Train the “Outer SNN” Teacher (Laying the Foundation)

Train a standard FixMatch SNN (f_outer) on BOTH labeled and unlabeled data.
Goal: Create a competent teacher model far superior to FixMatch’s untrained early teacher.
Result: Generates higher-quality initial pseudo-labels.

Step 2: Curate Trusted Pseudo-Labels (The Noise Filter)

This is the game-changer. Don’t blindly trust f_outer! KD-FixMatch uses a rigorous 2-stage filter:

High-Confidence Sampling:
- Keep pseudo-labels ONLY where the model’s maximum prediction probability >= τ_select (e.g., 0.80).
- Action: Filters out low-confidence, likely incorrect labels.
Deep Embedding Clustering:
- Extract latent features (pre-softmax layer) of the high-confidence samples.
- Cluster these embeddings (e.g., using techniques like DEC).
- Crucial Filter: Keep only samples where the cluster assignment matches the pseudo-label’s predicted class.
- Action: Removes samples where confidence was high but semantically inconsistent within the embedding space.

# Conceptual Pseudo-Code: Trusted Pseudo-Label Selection
trusted_indices = []
for unlabeled_data in unlabeled_dataset:
    pseudo_label, features = outer_snn.predict(unlabeled_data)
    max_prob = max(pseudo_label)
    if max_prob >= tau_select:  # High Confidence Check
        cluster_id = clustering_model.predict(features)
        predicted_class = argmax(pseudo_label)
        if cluster_id == predicted_class:  # Semantic Consistency Check
            trusted_indices.append(unlabeled_data.index)
T_u = trusted_indices  # Final Trusted Subset

Step 3: Train the “Inner SNN” Student (Powering Up Performance)

Initialize a new SNN (f_inner).
Train it on a powerful mix:
- Original Labeled Data
- Original Unlabeled Data (using FixMatch consistency)
- The Trusted Dataset (T_u) with high-quality pseudo-labels from the Outer SNN.
Key Innovation: f_inner starts training with significantly cleaner pseudo-labels than FixMatch’s student ever sees early on. It also benefits from consistency regularization on all unlabeled data.
Conflict Resolution: If f_inner later generates its own high-confidence pseudo-label for data in T_u, a heuristic favors the inner student’s label (assumed more refined).

Crushing the Benchmarks: KD-FixMatch’s Undeniable Proof

Amazon researchers tested rigorously across 4 standard datasets. The results speak volumes:

Table: Test Accuracy Comparison (Mean ± Std Dev %)

Method	250 Labels	1000 Labels	2000 Labels	4000 Labels
Baseline (Supervised Only)	33.47 ± 4.62	44.18 ± 4.40	82.62 ± 1.90	93.37 ± 0.24
FixMatch	84.96 ± 0.75	91.59 ± 0.58	94.25 ± 0.17	95.09 ± 0.13
KD-FixMatch-CE	87.30 ± 0.83	92.92 ± 0.86	94.38 ± 0.29	98.18 ± 0.18
KD-FixMatch-SCE-1.0-0.01	87.53 ± 0.47	93.03 ± 0.54	95.04 ± 0.18	96.33 ± 0.23
KD-FixMatch-SCE-1.0-0.1	87.55 ± 0.41	93.19 ± 0.32	95.50 ± 0.41	96.34 ± 0.13

The Takeaways Scream Success:

KD-FixMatch BEATS FixMatch EVERY Time: Regardless of dataset or number of labels, KD-FixMatch variants consistently outperformed FixMatch.
Massive Gains Where It Matters Most: With only 250 labels (extreme SSL), KD-FixMatch delivered a +2.59% to +2.89% absolute accuracy boost. This is HUGE in low-data regimes.
Robust Losses Add Extra Juice: Using Symmetric Cross Entropy (SCE) instead of standard CE (KD-FixMatch-SCE) provided small but consistent extra gains, showcasing the synergy with noise robustness.
The “Better Starting Point” Wins: KD-FixMatch’s core advantage – providing the inner student with cleaner pseudo-labels from the start – is decisively proven.

If you’re Interested in semi-supervised learning with advance Methods, you may also find this article helpful: Revolutionizing Healthcare: How DFCPS’ Breakthrough Semi-Supervised Learning Slashes Medical Image Segmentation Costs by 90%

Beyond the Basics: Why KD-FixMatch is a Game Changer

Solves the Early Noise Catastrophe: By decoupling teacher training via the outer SNN, it avoids poisoning the inner student from day one. This is FixMatch’s Achilles’ heel.
Smarter Than Simple Self-Training: Unlike methods like “Noisy Student” which require retraining the entire model multiple times to refine pseudo-labels, KD-FixMatch trains the outer and inner networks only once each. This is computationally more efficient long-term.
Leverages Knowledge Distillation Power: It explicitly uses the principle of transferring knowledge from a stronger teacher (f_outer) to a student (f_inner), but within a cohesive SSL framework.
Synergy with Robust Losses: The architecture naturally accommodates robust losses (like MAE, SCE, NCE) in the inner SNN, further hardening the model against any residual noise in pseudo-labels or T_u. The paper showed SCE provided easy wins.
Practical Impact: For industries like e-commerce (product defect detection mentioned in the paper), healthcare (medical image analysis with few labels), or autonomous driving (rare event detection), even small accuracy gains translate to massive cost savings, safety improvements, and customer satisfaction.

KD-FixMatch vs. The Competition: Where It Stands Tall

vs. FixMatch: Clear winner. Solves FixMatch’s critical early noise vulnerability via sequential training and trusted data curation. (+2.6% to +5.7% gains).
vs. Noisy Student: More efficient. Noisy Student requires multiple full retraining cycles. KD-FixMatch trains outer/inner nets once. Comparable or better performance.
vs. Self-Training: More sophisticated and robust. Simple self-training lacks consistency regularization and sophisticated pseudo-label filtering like deep embedding clustering.
vs. Mean Teacher: Different approach. Mean Teacher uses EMA weights for targets. KD-FixMatch uses sequential knowledge distillation and explicit high-quality pseudo-label sets (T_u). Results suggest KD-FixMatch is superior.

Implementing KD-FixMatch: Key Considerations & Challenges

While powerful, be mindful of these aspects:

Increased Compute (2x Training): Requires training the Outer SNN before the Inner SNN. This roughly doubles training time compared to FixMatch. (The paper acknowledges this but argues the performance gains justify it, especially when labeling costs are high).
Parameter Tuning: Selecting thresholds (τ_select, τ_inner) and robust loss parameters (α, β for SCE) needs validation set tuning. Start with paper values (τ_select=0.80, τ_inner=0.95, SCE α=1.0, β=0.1).
Clustering Choice: The effectiveness of the deep embedding clustering step depends on the chosen algorithm and its parameters. DEC is common, but alternatives exist.
Dataset Suitability: Benefits are most pronounced when pseudo-label noise in standard SSL is high (e.g., complex datasets, very few labels).

The Verdict: The computational overhead is a trade-off, but for critical applications where accuracy is paramount or labeling is prohibitively expensive, KD-FixMatch is a compelling, often necessary, choice.

Conclusion: Stop Settling for Noisy Labels – Embrace KD-FixMatch

The era of letting noisy pseudo-labels cripple your semi-supervised learning models is over. KD-FixMatch delivers a proven, systematic solution:

Sequential Knowledge Distillation: Trains a competent Outer Teacher first.
Military-Grade Pseudo-Label Curation: Combines high-confidence sampling AND deep embedding clustering for unmatched label quality (T_u).
Supercharged Inner Student: Trains on labeled data, unlabeled data (consistency), and the trusted T_u set.
Proven Performance: Delivers consistent, significant accuracy gains (up to +5.7%) over FixMatch across diverse datasets, especially under the extreme low-label scenarios where SSL matters most.

Don’t waste resources training on noise. KD-FixMatch provides the cleaner starting point and robust framework needed to unlock the true potential of your unlabeled data. The benchmarks don’t lie – it’s time to upgrade your SSL pipeline.

Ready to Slash Noise & Boost Your SSL Accuracy?

Implement KD-FixMatch: Grab the core concepts from this guide and the original paper (arXiv:2309.05826v1).
Share Your Challenge: Struggling with noisy labels in your specific domain? Describe your problem in the comments below!
Spread the Knowledge: Found this guide helpful? Share it with your colleagues battling the semi-supervised learning frontier! #SSL #KnowledgeDistillation #MachineLearning #DeepLearning #NoisyLabels #FixMatch #KDFixMatch

Here’s the complete implementation of KD-FixMatch based on the research paper, using TensorFlow/Keras with detailed comments:

import tensorflow as tf
from tensorflow.keras import layers, models, applications
from tensorflow.keras import mixed_precision
import numpy as np
from sklearn.cluster import KMeans
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt

# Enable mixed precision for faster training
policy = mixed_precision.Policy('mixed_float16')
mixed_precision.set_global_policy(policy)

# =====================
# Configuration
# =====================
class Config:
    # Dataset parameters
    NUM_CLASSES = 10
    INPUT_SHAPE = (32, 32, 3)
    
    # Training parameters
    BATCH_SIZE = 128
    LABELED_BATCH = 64
    UNLABELED_BATCH = 64
    EPOCHS_OUTER = 100
    EPOCHS_INNER = 100
    LEARNING_RATE = 7e-5
    WEIGHT_DECAY = 5e-4
    
    # Augmentation parameters
    WEAK_AUG = ["flip_left_right", "translate", "zoom"]
    STRONG_AUG = ["flip_left_right", "translate", "zoom", "rotate", 
                  "solarize", "color", "contrast", "sharpness"]
    
    # Thresholds
    TAU_SELECT = 0.80  # For trusted pseudo-label selection
    TAU_INNER = 0.95   # For inner SNN training
    TAU = 0.95         # For FixMatch consistency
    LAMBDA_U = 1.0     # Unsupervised loss weight
    
    # Clustering
    CLUSTERING_METHOD = "kmeans"  # Options: "kmeans", "spectral"
    
    # Model architecture
    BACKBONE = "efficientnetb0"

config = Config()

# =====================
# Data Augmentation
# =====================
class Augmentation:
    @staticmethod
    def weak_augment(image):
        image = tf.image.random_flip_left_right(image)
        image = tf.image.random_crop(image, size=[tf.shape(image)[0], 
                                     config.INPUT_SHAPE[0]-4, 
                                     config.INPUT_SHAPE[1]-4, 
                                     config.INPUT_SHAPE[2]])
        image = tf.image.resize(image, config.INPUT_SHAPE[:2])
        return image

    @staticmethod
    def strong_augment(image):
        image = Augmentation.weak_augment(image)
        image = tf.image.random_brightness(image, max_delta=0.8)
        image = tf.image.random_saturation(image, lower=0.1, upper=1.9)
        image = tf.image.random_contrast(image, lower=0.1, upper=1.9)
        image = tf.image.random_hue(image, max_delta=0.2)
        image = tf.clip_by_value(image, 0, 255)
        return image

# =====================
# Model Architecture
# =====================
def create_model():
    """Create EfficientNet-B0 backbone with multi-output (logits + embeddings)"""
    base_model = applications.EfficientNetB0(
        include_top=False,
        weights='imagenet',
        input_shape=config.INPUT_SHAPE,
        pooling='avg'
    )
    
    inputs = layers.Input(shape=config.INPUT_SHAPE)
    x = base_model(inputs)
    embeddings = layers.Dense(256, activation='relu', name='embedding_layer')(x)
    logits = layers.Dense(config.NUM_CLASSES, name='logits')(embeddings)
    outputs = layers.Activation('softmax', dtype='float32', name='softmax')(logits)
    
    return models.Model(inputs, [outputs, embeddings], name='efficientnet_snn')

# =====================
# Siamese Network (SNN)
# =====================
class SiameseNetwork(tf.keras.Model):
    def __init__(self, model, **kwargs):
        super(SiameseNetwork, self).__init__(**kwargs)
        self.model = model
        self.ema_model = models.clone_model(model)
        self.ema_model.set_weights(model.get_weights())
        self.ema_decay = 0.999
        self.tau = config.TAU
        self.lambda_u = config.LAMBDA_U
        
    def train_step(self, data):
        # Unpack data
        labeled_images, labels = data[0]
        unlabeled_images = data[1]
        
        # Process labeled data
        with tf.GradientTape() as tape:
            # Weak augmentation for labeled data
            labeled_weak = Augmentation.weak_augment(labeled_images)
            preds, _ = self.model(labeled_weak, training=True)
            supervised_loss = tf.keras.losses.categorical_crossentropy(labels, preds)
            
            # Process unlabeled data
            if tf.shape(unlabeled_images)[0] > 0:
                # Weak and strong augmentations
                unlabeled_weak = Augmentation.weak_augment(unlabeled_images)
                unlabeled_strong = Augmentation.strong_augment(unlabeled_images)
                
                # Teacher predictions
                teacher_probs, _ = self.ema_model(unlabeled_weak, training=False)
                
                # Pseudo-label with confidence threshold
                max_probs = tf.reduce_max(teacher_probs, axis=-1)
                mask = tf.cast(max_probs >= self.tau, tf.float32)
                pseudo_labels = tf.one_hot(
                    tf.argmax(teacher_probs, axis=-1), 
                    depth=config.NUM_CLASSES
                )
                
                # Student predictions on strong augmentation
                student_probs, _ = self.model(unlabeled_strong, training=True)
                
                # Unsupervised loss
                unsupervised_loss = tf.keras.losses.categorical_crossentropy(
                    pseudo_labels, student_probs
                )
                unsupervised_loss = tf.reduce_mean(unsupervised_loss * mask) * self.lambda_u
            else:
                unsupervised_loss = 0.0
                
            total_loss = supervised_loss + unsupervised_loss
            
        # Update weights
        grads = tape.gradient(total_loss, self.model.trainable_weights)
        self.optimizer.apply_gradients(zip(grads, self.model.trainable_weights))
        
        # Update EMA model
        for weight, ema_weight in zip(self.model.weights, self.ema_model.weights):
            ema_weight.assign(self.ema_decay * ema_weight + (1 - self.ema_decay) * weight)
            
        return {
            "total_loss": total_loss,
            "supervised_loss": supervised_loss,
            "unsupervised_loss": unsupervised_loss
        }

# =====================
# KD-FixMatch Algorithm
# =====================
class KDFixMatch:
    def __init__(self):
        self.outer_model = None
        self.inner_model = None
        self.trusted_set = None
        
    def train_outer_snn(self, labeled_data, unlabeled_data):
        """Train outer SNN using FixMatch approach"""
        print("Training Outer SNN...")
        model = create_model()
        self.outer_model = SiameseNetwork(model)
        
        self.outer_model.compile(
            optimizer=tf.keras.optimizers.Adam(
                learning_rate=config.LEARNING_RATE,
                beta_1=0.9,
                beta_2=0.999
            )
        )
        
        # Create dataset
        full_dataset = tf.data.Dataset.zip((
            labeled_data.shuffle(10000).repeat(),
            unlabeled_data.shuffle(10000).repeat()
        )).batch(config.BATCH_SIZE)
        
        # Training loop
        self.outer_model.fit(
            full_dataset,
            epochs=config.EPOCHS_OUTER,
            steps_per_epoch=max(len(labeled_data) // config.BATCH_SIZE, 1),
            verbose=2
        )
        
    def select_trusted_pseudo_labels(self, unlabeled_data):
        """Select high-quality pseudo-labels using confidence and clustering"""
        print("Selecting trusted pseudo-labels...")
        
        # Step 1: High-confidence selection
        embeddings_list = []
        pseudo_labels_list = []
        indices = []
        
        for batch in unlabeled_data.batch(config.BATCH_SIZE):
            weak_aug = Augmentation.weak_augment(batch)
            probs, embeddings = self.outer_model.ema_model(weak_aug, training=False)
            
            max_probs = tf.reduce_max(probs, axis=-1)
            high_conf_mask = max_probs >= config.TAU_SELECT
            
            embeddings_list.append(embeddings[high_conf_mask])
            pseudo_labels_list.append(tf.argmax(probs[high_conf_mask], axis=-1))
            indices.extend([i for i, mask in enumerate(high_conf_mask) if mask])
        
        embeddings = tf.concat(embeddings_list, axis=0).numpy()
        pseudo_labels = tf.concat(pseudo_labels_list, axis=0).numpy()
        
        # Step 2: Deep embedding clustering
        cluster_labels = KMeans(n_clusters=config.NUM_CLASSES, random_state=0).fit_predict(embeddings)
        
        # Step 3: Consensus filtering
        trusted_indices = []
        cluster_pseudo_map = {}
        
        # Find majority class in each cluster
        for cluster_id in range(config.NUM_CLASSES):
            cluster_mask = (cluster_labels == cluster_id)
            if np.sum(cluster_mask) == 0:
                continue
                
            cluster_pseudo = pseudo_labels[cluster_mask]
            majority_class = np.argmax(np.bincount(cluster_pseudo))
            cluster_pseudo_map[cluster_id] = majority_class
            
            # Select samples where pseudo-label matches cluster majority
            class_match_mask = (cluster_pseudo == majority_class)
            trusted_indices_in_cluster = np.where(cluster_mask)[0][class_match_mask]
            trusted_indices.extend(trusted_indices_in_cluster.tolist())
        
        # Create trusted dataset
        trusted_images = []
        trusted_labels = []
        
        unlabeled_images = list(unlabeled_data.unbatch().as_numpy_iterator())
        for idx in trusted_indices:
            trusted_images.append(unlabeled_images[indices[idx]])
            trusted_labels.append(pseudo_labels[idx])
        
        # Convert to one-hot labels
        trusted_labels = tf.one_hot(trusted_labels, depth=config.NUM_CLASSES)
        self.trusted_set = tf.data.Dataset.from_tensor_slices(
            (np.array(trusted_images), trusted_labels)
        
        print(f"Selected {len(trusted_images)} trusted samples "
              f"({len(trusted_images)/len(unlabeled_images)*100:.2f}% of unlabeled data)")
        return self.trusted_set
    
    def train_inner_snn(self, labeled_data, unlabeled_data, robust_loss=False):
        """Train inner SNN with trusted pseudo-labels"""
        print("Training Inner SNN with trusted pseudo-labels...")
        
        # Create new labeled dataset (original + trusted)
        combined_labeled = labeled_data.concatenate(self.trusted_set)
        
        # Remove trusted samples from unlabeled set
        trusted_images = [img for img, _ in self.trusted_set]
        unlabeled_data = unlabeled_data.filter(
            lambda x: tf.reduce_any(tf.not_equal(
                tf.expand_dims(x, 0), tf.expand_dims(trusted_images, 1)), axis=0)
        )
        
        # Create model
        model = create_model()
        self.inner_model = SiameseNetwork(model)
        
        # Use robust loss if specified
        if robust_loss:
            def sce_loss(y_true, y_pred):
                ce = tf.keras.losses.categorical_crossentropy(y_true, y_pred)
                rce = tf.keras.losses.categorical_crossentropy(y_pred, y_true)
                return 1.0 * ce + 0.1 * rce
            unsupervised_loss = sce_loss
        else:
            unsupervised_loss = tf.keras.losses.categorical_crossentropy
        
        self.inner_model.compile(
            optimizer=tf.keras.optimizers.Adam(
                learning_rate=config.LEARNING_RATE,
                beta_1=0.9,
                beta_2=0.999
            )
        )
        
        # Create dataset
        full_dataset = tf.data.Dataset.zip((
            combined_labeled.shuffle(10000).repeat(),
            unlabeled_data.shuffle(10000).repeat()
        )).batch(config.BATCH_SIZE)
        
        # Training loop
        history = self.inner_model.fit(
            full_dataset,
            epochs=config.EPOCHS_INNER,
            steps_per_epoch=max(len(combined_labeled) // config.BATCH_SIZE, 1),
            verbose=2
        )
        
        return history

# =====================
# Evaluation Metrics
# =====================
def evaluate_model(model, test_data):
    """Evaluate model performance on test set"""
    acc_metric = tf.keras.metrics.CategoricalAccuracy()
    for images, labels in test_data.batch(config.BATCH_SIZE):
        probs, _ = model.ema_model(images, training=False)
        acc_metric.update_state(labels, probs)
    
    return acc_metric.result().numpy()

# =====================
# Main Workflow
# =====================
if __name__ == "__main__":
    # Load datasets (example with CIFAR-10)
    (x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()
    
    # Preprocess data
    x_train = x_train.astype('float32') / 255.0
    x_test = x_test.astype('float32') / 255.0
    y_train = tf.keras.utils.to_categorical(y_train, config.NUM_CLASSES)
    y_test = tf.keras.utils.to_categorical(y_test, config.NUM_CLASSES)
    
    # Split into labeled and unlabeled (simulate 1000 labeled examples)
    labeled_indices = np.random.choice(
        len(x_train), size=1000, replace=False)
    unlabeled_indices = np.setdiff1d(np.arange(len(x_train)), labeled_indices)
    
    labeled_data = tf.data.Dataset.from_tensor_slices(
        (x_train[labeled_indices], y_train[labeled_indices]))
    unlabeled_data = tf.data.Dataset.from_tensor_slices(
        x_train[unlabeled_indices])
    test_data = tf.data.Dataset.from_tensor_slices((x_test, y_test))
    
    # Initialize KD-FixMatch
    kdfm = KDFixMatch()
    
    # Step 1: Train Outer SNN
    kdfm.train_outer_snn(labeled_data, unlabeled_data)
    
    # Evaluate Outer SNN
    outer_acc = evaluate_model(kdfm.outer_model, test_data)
    print(f"Outer SNN Test Accuracy: {outer_acc:.4f}")
    
    # Step 2: Select trusted pseudo-labels
    trusted_set = kdfm.select_trusted_pseudo_labels(unlabeled_data)
    
    # Step 3: Train Inner SNN with robust loss
    history = kdfm.train_inner_snn(labeled_data, unlabeled_data, robust_loss=True)
    
    # Evaluate Inner SNN
    inner_acc = evaluate_model(kdfm.inner_model, test_data)
    print(f"Inner SNN Test Accuracy: {inner_acc:.4f}")
    print(f"Accuracy Improvement: {(inner_acc - outer_acc)*100:.2f}%")
    
    # Visualization of results
    plt.figure(figsize=(12, 5))
    
    # Plot accuracy comparison
    plt.subplot(1, 2, 1)
    models = ['Outer SNN', 'Inner SNN']
    accuracies = [outer_acc, inner_acc]
    plt.bar(models, accuracies, color=['blue', 'green'])
    plt.title('Model Accuracy Comparison')
    plt.ylabel('Accuracy')
    plt.ylim(0, 1)
    
    # Plot loss history
    plt.subplot(1, 2, 2)
    plt.plot(history.history['total_loss'], label='Total Loss')
    plt.plot(history.history['supervised_loss'], label='Supervised Loss')
    plt.plot(history.history['unsupervised_loss'], label='Unsupervised Loss')
    plt.title('Training Loss History')
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.legend()
    
    plt.tight_layout()
    plt.savefig('kdfixmatch_results.png', dpi=300)
    plt.show()