7 Revolutionary Breakthroughs in Medical Image Translation (And 1 Fatal Flaw That Could Derail Your AI Model)

Medical imaging has long been the cornerstone of modern diagnostics. From detecting tumors to planning radiotherapy, the quality and availability of imaging modalities like MRI and CT can make or break patient outcomes. But what if one scan could become another? What if a non-invasive MRI could reliably generate a synthetic CT—eliminating radiation exposure and streamlining workflows?

This is the promise of medical image translation, a rapidly evolving field where artificial intelligence transforms images from one modality to another. While early methods relied on basic neural networks, today’s cutting-edge models use diffusion bridges—a powerful new paradigm that’s redefining what’s possible.

Enter Self-consistent Recursive Diffusion Bridge (SelfRDB), a groundbreaking model introduced by Arslan et al. in their 2025 Medical Image Analysis paper. SelfRDB isn’t just another incremental improvement—it’s a revolutionary leap that outperforms every major competitor, from GANs to state-of-the-art diffusion models.

But there’s a catch.

Despite its brilliance, SelfRDB—and the entire class of diffusion-based models—faces a critical flaw: stochastic variability in outputs. Two runs, same input, slightly different results. In clinical settings, inconsistency isn’t just inconvenient—it’s dangerous.

In this article, we’ll explore:

The 7 key innovations that make SelfRDB a game-changer
How it crushes GANs and older diffusion models
The one fatal flaw holding it back from clinical adoption
And what the future holds for AI-powered medical imaging

Let’s dive in.

What Is Medical Image Translation?

Medical image translation involves converting an image from one modality (e.g., MRI) into another (e.g., CT), preserving anatomical fidelity while mimicking the physical properties of the target scan.

Common use cases include:

MRI-to-CT synthesis for radiation-free attenuation correction in PET scans
Multi-contrast MRI generation to reduce scan time
Missing modality imputation in retrospective studies

The challenge? Modalities capture different physical properties. MRI emphasizes soft tissue contrast via proton density and relaxation times, while CT measures electron density via X-ray attenuation. There’s no direct mathematical mapping—only complex, nonlinear relationships.

Traditional methods like linear regression or atlas-based registration fail to capture this complexity. Enter deep learning.

The Rise of Deep Learning in Image Translation

Over the past decade, deep learning has dominated medical image translation. Two architectures have led the charge:

1. Generative Adversarial Networks (GANs)

Use a generator and discriminator in a competitive setup
Fast inference, but suffer from mode collapse and blurry outputs
Examples: pix2pix, SAGAN, CycleGAN

2. Denoising Diffusion Models (DDMs)

Generate images by reversing a gradual noising process
Produce high-quality, diverse samples
Computationally expensive and slow

While both have strengths, they struggle with multi-modal translation, where the source image doesn’t fully determine the target (e.g., predicting CT from T1 MRI).

The Game Changer: Diffusion Bridges

A diffusion bridge reimagines the diffusion process by treating both source and target images as endpoints of a stochastic path.

Unlike standard DDMs, which start from pure noise and aim for a single data distribution, diffusion bridges condition the reverse process on the source image, enabling direct modality-to-modality transformation.

Recent models like:

have shown promise. But they still rely on stationary guidance—using the original source image throughout the reverse process—which limits adaptability and robustness to noise.

7 Revolutionary Breakthroughs of SelfRDB

SelfRDB introduces a self-consistent recursive framework that fundamentally improves diffusion bridge performance. Here are the 7 key innovations:

1. Soft Prior on the Source Modality

Instead of rigidly anchoring to the source image, SelfRDB uses a soft prior—a probabilistic belief that allows controlled deviation when noise or artifacts corrupt the input.

This makes the model robust to real-world imperfections, such as motion artifacts or low signal-to-noise ratio.

🔍 Ablation studies show that removing the soft prior leads to a 15% drop in PSNR under noisy conditions.

2. Self-Consistent Recursion in Reverse Sampling

At each reverse step, SelfRDB doesn’t just generate a one-off estimate. It recursively refines its prediction until convergence—a self-consistent solution.

This is like double-checking your work before submitting it.

Mathematically, at step t , the model iterates:

\[ x_{t-1}^{k+1} =\; f_{\theta}\!\big( x_{t}^{k},\, y;\, t \big) \] \[ \text{Until}\qquad \lVert x_{t-1}^{k+1} – x_{t}^{k} \rVert < \varepsilon. \]

This boosts sampling accuracy and reduces error propagation.

3. Enhanced Noise Scheduling

SelfRDB uses a curriculum-based noise schedule that prioritizes coarse structures early and fine details later.

This mimics how radiologists interpret images—first anatomy, then pathology.

4. Stationary Guidance from Source – Bridging Modalities Effectively

Diffusion bridges like I2SB use the source image as a static anchor during reverse diffusion. SelfRDB enhances this with stationary guidance, ensuring the source modality continuously informs each denoising step.

Ablation results confirm its importance:

VARIANT	PSNR ↓	SSIM ↓	FID (↓)
Full SelfRDB	32.1	0.92	18.3
No Stationary Guidance	29.4	0.87	26.7

Table: Performance drop when stationary guidance is removed (lower PSNR/SSIM, higher FID = worse).

5. Superior Performance on Challenging Tasks

While most models excel at T1→T2 MRI translation (high correlation), they falter when tasks get hard.

SelfRDB shines in challenging scenarios:

CT → MRI prediction (low tissue contrast correlation)
Exogenous → Endogenous MRI contrast synthesis

As the authors state:

“SelfRDB still achieves the best performance metrics, significantly outperforming baselines.”

This makes it ideal for real-world applications where data is incomplete or misaligned.

6. State-of-the-Art Results Across Benchmarks

SelfRDB was tested on:

IXI Dataset (brain MRI: T1→T2, T2→T1)
Pelvis MRI-CT Dataset (T1/T2 MRI → CT)

Results across PSNR, SSIM, and FID consistently ranked SelfRDB #1.

Table: Performance on IXI Dataset (T1→T2 MRI)

MODEL	PSNR (↑)	SSIM (↑)	FID (↓)
pix2pix (GAN)	28.3	0.85	32.1
SAGAN	29.1	0.86	29.8
DDPM	30.2	0.88	25.4
I2SB	30.9	0.89	23.6
SelfRDB (Ours)	32.1	0.92	18.3

Higher PSNR/SSIM, lower FID = better image quality and realism.

7. Scalable & Adaptable Architecture

SelfRDB isn’t just accurate—it’s flexible:

Built on a convolutional backbone, but compatible with transformers for long-range context.
Supports supervised, unsupervised, and few-shot learning.
Can be accelerated via distillation or intermediate initialization.

Future work may integrate test-time adaptation or zero-shot learning, making it deployable even with minimal labeled data.

How SelfRDB Works: The Science Behind the Magic

Forward Process: A Diffusion Bridge with Soft Priors

Unlike standard diffusion models that start from pure noise, SelfRDB defines a bridge between source y ∼ p_source and target x₀ ∼ p_target . Intermediate states x_tare sampled from a normal distribution:

\[ x_t \sim \mathcal{N}\big(\alpha(t) x_0 + (1 – \alpha(t))\, y,\; \beta(t)\, \mathbf{I} \big) \]

where α(t) controls the blend from source to target, and β(t) is the noise schedule.

The soft prior on y allows gradual, probabilistic integration—unlike hard conditioning in GANs.

Reverse Process: Self-Consistent Recursion

During sampling, SelfRDB iteratively refines its estimate x₀(k) at step k :

\[ x_{0}^{k+1} = D_{\theta}(x_{t},\, t,\, y) \]

where D_θ is the denoising network. The recursion continues until:

\[ x_{0}^{k+1} – x_{0}^{k} \lt \epsilon \]

This fixed-point convergence ensures anatomical consistency and reduces stochastic variability.

Training Objective: Score Matching with Guidance

SelfRDB minimizes a weighted score matching loss:

\[ L = \mathbb{E}_{t, x_0, y, x_t} \big[ \lambda(t) \, \| s_{\theta}(x_t, t, y) – \nabla_{x_t} \log p(x_t \mid x_0, y) \|_2^2 \big] \]

where s_θ is the score estimator, and λ(t) emphasizes critical time steps.

This formulation enables stable training and high-fidelity generation.

⚖️ SelfRDB vs. The Competition: Head-to-Head Comparison

FEATURE	SELFRDB	GANS(PIX2PIX)	DDPM	12SB
Image Fidelity	✅ Best	❌ Hallucinations	✅ Good	✅ Good
Noise Robustness	✅ Excellent	❌ Poor	⚠️ Moderate	⚠️ Moderate
Sampling Accuracy	✅ Self-consistent	❌ One-pass	⚠️ Fixed steps	⚠️ No recursion
Training Stability	✅ High	❌ Mode collapse	✅ High	✅ High
Computational Cost	⚠️ High	✅ Low	❌ Very High	❌ High

While GANs win on speed, SelfRDB dominates in quality and reliability—critical for clinical use.

🛠️ Computational Efficiency: The Trade-Off

SelfRDB isn’t perfect. As with all diffusion models, it’s computationally intensive.

Table: Computational Load (Per Cross-Section)

MODEL	TRAINING TIME (SEC)	INFERENCE TIME (SEC)	MEMORY (GB)
pix2pix	0.8	0.9	2.1
SAGAN	1.1	1.3	2.4
SynDiff	12.4	15.2	8.7
I2SB	14.7	18.3	9.1
SelfRDB	16.5	20.1	10.3

However, the authors suggest future optimizations:

Model distillation to reduce diffusion steps.
Hybrid initialization using fast GANs to seed diffusion.
Transformer backbones for better long-range modeling.

These could close the speed gap without sacrificing quality.

Future Directions & Clinical Impact

SelfRDB opens doors to:

Zero-shot translation across unseen modalities.
Integration with reconstruction (e.g., undersampled MRI → synthetic CT).
Anomaly detection via discrepancy between real and synthetic images.
Personalized imaging protocols with minimal acquisitions.

As the authors note:

“It remains important future work to assess the reliability of SelfRDB in a greater variety of challenging translation tasks.”

With further validation on diverse datasets (e.g., BRATS, ADNI), SelfRDB could become the new gold standard in medical image synthesis.

Final Verdict: The Game-Changer in Medical AI

SelfRDB represents a quantum leap in medical image translation. By combining:

Soft source priors
Self-consistent recursion
Enhanced noise scheduling

…it delivers unmatched image quality, robustness to noise, and superior performance on hard tasks.

While computational demands remain a hurdle, the clinical benefits—safer, faster, more accurate imaging—far outweigh the costs.

For researchers: explore SelfRDB’s code and adapt it to your modality.
For clinicians: watch for tools integrating this tech into PACS and AI assistants.
For developers: optimize inference pipelines to bring SelfRDB to real-time use.

Call to Action: Join the Medical AI Revolution

Want to dive deeper?
👉 Download the full paper: Self-consistent Recursive Diffusion Bridge for Medical Image Translation
👉 Explore the code on GitHub (coming soon)
👉 Subscribe to MedTech Insights for the latest in AI-driven healthcare innovation

Your next breakthrough starts with a single scan. Make it count.

Below is the complete, end-to-end Python code for the SelfRDB model.

# Self-consistent Recursive Diffusion Bridge (SelfRDB) for Medical Image Translation
# Implemented based on the paper: https://doi.org/10.1016/j.media.2025.103747
# "Self-consistent recursive diffusion bridge for medical image translation"
# by Arslan, F., Kabas, B., Dalmaz, O., Ozbey, M., & Çukur, T.

import torch
import torch.nn as nn
import torch.nn.functional as F
import math
from tqdm import tqdm

# --- Helper Modules ---

class SinusoidalPosEmb(nn.Module):
    """
    Computes sinusoidal positional embeddings for the timestep.
    This allows the model to be conditioned on the current diffusion step.
    """
    def __init__(self, dim):
        super().__init__()
        self.dim = dim

    def forward(self, t):
        device = t.device
        half_dim = self.dim // 2
        emb = math.log(10000) / (half_dim - 1)
        emb = torch.exp(torch.arange(half_dim, device=device) * -emb)
        emb = t[:, None] * emb[None, :]
        emb = torch.cat((emb.sin(), emb.cos()), dim=-1)
        return emb

class ResidualBlock(nn.Module):
    """
    A standard residual block with two convolutional layers, group normalization,
    and SiLU activation. Time embeddings are incorporated additively.
    """
    def __init__(self, in_channels, out_channels, time_emb_dim, groups=8):
        super().__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1)
        self.norm1 = nn.GroupNorm(groups, out_channels)
        self.act1 = nn.SiLU()

        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1)
        self.norm2 = nn.GroupNorm(groups, out_channels)
        self.act2 = nn.SiLU()

        self.time_mlp = nn.Sequential(
            nn.SiLU(),
            nn.Linear(time_emb_dim, out_channels)
        ) if time_emb_dim is not None else None

        self.residual_conv = nn.Conv2d(in_channels, out_channels, 1) if in_channels != out_channels else nn.Identity()

    def forward(self, x, t_emb=None):
        h = self.act1(self.norm1(self.conv1(x)))
        if self.time_mlp is not None and t_emb is not None:
            h = h + self.time_mlp(t_emb)[:, :, None, None]
        h = self.act2(self.norm2(self.conv2(h)))
        return h + self.residual_conv(x)

class UNet(nn.Module):
    """
    A UNet architecture with residual blocks and attention, used as the generator backbone.
    It takes the noisy image, timestep, source image, and the previous target estimate as input.
    """
    def __init__(self, in_channels, out_channels, time_emb_dim=256, dim=64, dim_mults=(1, 2, 4, 8)):
        super().__init__()

        # Timestep embedding projection
        self.time_mlp = nn.Sequential(
            SinusoidalPosEmb(dim),
            nn.Linear(dim, time_emb_dim),
            nn.GELU(),
            nn.Linear(time_emb_dim, time_emb_dim),
        )

        # Input channels are: noisy image, source image, previous target estimate
        self.init_conv = nn.Conv2d(in_channels * 3, dim, 7, padding=3)
        dims = [dim, *map(lambda m: dim * m, dim_mults)]
        in_out = list(zip(dims[:-1], dims[1:]))
        
        # --- Encoder ---
        self.downs = nn.ModuleList([])
        for i, (dim_in, dim_out) in enumerate(in_out):
            is_last = i >= (len(in_out) - 1)
            self.downs.append(nn.ModuleList([
                ResidualBlock(dim_in, dim_out, time_emb_dim),
                ResidualBlock(dim_out, dim_out, time_emb_dim),
                nn.Conv2d(dim_out, dim_out, 4, 2, 1) if not is_last else nn.Identity()
            ]))

        # --- Bottleneck ---
        mid_dim = dims[-1]
        self.mid_block1 = ResidualBlock(mid_dim, mid_dim, time_emb_dim)
        self.mid_block2 = ResidualBlock(mid_dim, mid_dim, time_emb_dim)

        # --- Decoder ---
        self.ups = nn.ModuleList([])
        for i, (dim_in, dim_out) in enumerate(reversed(in_out[1:])):
            is_last = i >= (len(in_out) - 1)
            self.ups.append(nn.ModuleList([
                ResidualBlock(dim_out * 2, dim_in, time_emb_dim),
                ResidualBlock(dim_in, dim_in, time_emb_dim),
                nn.ConvTranspose2d(dim_in, dim_in // 2, 4, 2, 1) if not is_last else nn.Conv2d(dim_in, dim_in//2, 1)
            ]))
            
        self.final_res_block = ResidualBlock(dim*2, dim, time_emb_dim)
        self.final_conv = nn.Conv2d(dim, out_channels, 1)

    def forward(self, x_t, t, y, x0_r):
        # Concatenate inputs along the channel dimension
        x = torch.cat([x_t, y, x0_r], dim=1)

        t_emb = self.time_mlp(t)
        x = self.init_conv(x)
        h = [x]

        # Downsampling
        for res1, res2, downsample in self.downs:
            x = res1(x, t_emb)
            h.append(x)
            x = res2(x, t_emb)
            x = downsample(x)
            h.append(x)
        
        # Bottleneck
        x = self.mid_block1(x, t_emb)
        x = self.mid_block2(x, t_emb)

        # Upsampling
        for res1, res2, upsample in self.ups:
            x = torch.cat((x, h.pop()), dim=1)
            x = res1(x, t_emb)
            x = torch.cat((x, h.pop()), dim=1)
            x = res2(x, t_emb)
            x = upsample(x)
            
        x = torch.cat((x, h.pop()), dim=1)
        x = self.final_res_block(x, t_emb)
        return self.final_conv(x)


class Discriminator(nn.Module):
    """
    A simple convolutional discriminator network.
    It takes the image sample and timestep as input and outputs a logit.
    """
    def __init__(self, in_channels, time_emb_dim=256, dim=64, dim_mults=(1, 2, 4, 8)):
        super().__init__()
        
        self.time_mlp = nn.Sequential(
            SinusoidalPosEmb(dim),
            nn.Linear(dim, time_emb_dim),
            nn.GELU(),
            nn.Linear(time_emb_dim, time_emb_dim),
        )
        
        # Input channels are: image sample, and concatenated x_t
        self.init_conv = nn.Conv2d(in_channels * 2, dim, 3, padding=1)
        
        dims = [dim, *map(lambda m: dim * m, dim_mults)]
        in_out = list(zip(dims[:-1], dims[1:]))

        self.stages = nn.ModuleList([])
        for i, (dim_in, dim_out) in enumerate(in_out):
            self.stages.append(nn.ModuleList([
                ResidualBlock(dim_in, dim_out, time_emb_dim),
                nn.Conv2d(dim_out, dim_out, kernel_size=4, stride=2, padding=1)
            ]))
        
        final_dim = dims[-1]
        self.final_conv = nn.Conv2d(final_dim, 1, 3, padding=1)

    def forward(self, x, t, x_t):
        # Concatenate the sample with the conditioned image x_t
        x = torch.cat([x, x_t], dim=1)
        t_emb = self.time_mlp(t)
        x = self.init_conv(x)

        for res, downsample in self.stages:
            x = res(x, t_emb)
            x = downsample(x)
            
        return self.final_conv(x)


# --- Main SelfRDB Model ---

class SelfRDB(nn.Module):
    """
    The main Self-consistent Recursive Diffusion Bridge model.
    """
    def __init__(
        self,
        image_size,
        in_channels,
        out_channels,
        timesteps=10,
        gamma=2.2, # Noise level hyperparameter at the end-point
        num_recursions=2, # R in the paper
        lr=1e-4,
        lambda_l1=1.0,
        lambda_gp=1.0,
        device='cuda'
    ):
        super().__init__()
        self.image_size = image_size
        self.in_channels = in_channels
        self.out_channels = out_channels
        self.timesteps = timesteps
        self.gamma = gamma
        self.num_recursions = num_recursions
        self.device = device

        # Instantiate networks
        self.generator = UNet(in_channels=in_channels, out_channels=out_channels).to(device)
        self.discriminator = Discriminator(in_channels=out_channels).to(device)

        # Optimizers
        self.g_optimizer = torch.optim.Adam(self.generator.parameters(), lr=lr, betas=(0.5, 0.9))
        self.d_optimizer = torch.optim.Adam(self.discriminator.parameters(), lr=lr, betas=(0.5, 0.9))
        
        # Loss weights
        self.lambda_l1 = lambda_l1
        self.lambda_gp = lambda_gp

        # Set up the diffusion schedule
        self.set_schedule()

    def set_schedule(self):
        """
        Sets up the novel diffusion schedule as described in the paper (Eqs. 5, 6, 8).
        """
        T = self.timesteps
        t = torch.arange(T + 1, device=self.device)
        
        # Diffusion coefficient g(t) (Eq. 6)
        g_t = (T - 2 * t) ** 2 / (4 * T * (T - t) ** 2)
        g_t[T] = 0 # Handle division by zero at t=T
        
        # Time-accumulated diffusion coefficients s_t^2 and s_bar_t^2 (Eq. 5)
        s_t_sq = torch.cumsum(g_t, dim=0)
        s_bar_t_sq = torch.flip(torch.cumsum(torch.flip(g_t, dims=[0]), dim=0), dims=[0])
        s_bar_t_sq[1:] = s_bar_t_sq[:-1] # Shift to get integral from t to T
        s_bar_t_sq[0] = s_t_sq[-1] # s_bar_0^2 is the total integral
        
        # Mean schedule weights (Eq. 5)
        self.mu_x0_t = s_bar_t_sq / (s_bar_t_sq + s_t_sq)
        self.mu_y_t = s_t_sq / (s_bar_t_sq + s_t_sq)
        
        # Novel noise variance schedule (Eq. 8)
        self.sigma_t_sq = self.gamma * s_t_sq / (s_bar_t_sq + s_t_sq)
        self.sigma_t = torch.sqrt(self.sigma_t_sq)
        
        # For posterior calculation (Eq. 14)
        self.sigma_t_given_t_minus_1_sq = self.sigma_t_sq[1:] - self.sigma_t_sq[:-1] * \
                                          (self.mu_x0_t[1:] / self.mu_x0_t[:-1])**2

    def _extract(self, a, t, x_shape):
        """Helper to extract values from schedule tensors."""
        batch_size = t.shape[0]
        out = a.to(t.device).gather(0, t)
        return out.reshape(batch_size, *((1,) * (len(x_shape) - 1)))

    def q_sample(self, x0, y, t):
        """
        Forward process q(x_t | x0, y) (Eq. 3 & 4).
        Generates a noisy sample x_t at timestep t.
        """
        mu_x0 = self._extract(self.mu_x0_t, t, x0.shape)
        mu_y = self._extract(self.mu_y_t, t, y.shape)
        sigma = self._extract(self.sigma_t, t, x0.shape)
        
        noise = torch.randn_like(x0)
        x_t = mu_x0 * x0 + mu_y * y + sigma * noise
        return x_t, noise

    def p_mean_variance(self, x_t, t, y, x0_hat):
        """
        Calculates the mean and variance of the reverse posterior q(x_{t-1} | x_t, y, x0_hat)
        as derived in Eqs. 13 and 14.
        """
        sigma_t_sq = self._extract(self.sigma_t_sq, t, x_t.shape)
        sigma_t_minus_1_sq = self._extract(self.sigma_t_sq, t - 1, x_t.shape)
        
        mu_x0_t = self._extract(self.mu_x0_t, t, x_t.shape)
        mu_x0_t_minus_1 = self._extract(self.mu_x0_t, t - 1, x_t.shape)

        mu_y_t = self._extract(self.mu_y_t, t, x_t.shape)
        mu_y_t_minus_1 = self._extract(self.mu_y_t, t - 1, x_t.shape)
        
        sigma_t_given_t_minus_1_sq = self._extract(self.sigma_t_given_t_minus_1_sq, t-1, x_t.shape)
        
        # Posterior mean (Eq. 13)
        term1 = (sigma_t_minus_1_sq / sigma_t_sq) * (mu_x0_t / mu_x0_t_minus_1) * x_t
        term2 = (mu_y_t_minus_1 - mu_y_t * (sigma_t_minus_1_sq / sigma_t_sq) * (mu_x0_t / mu_x0_t_minus_1)) * y
        term3 = (1 - mu_y_t_minus_1 * (sigma_t_given_t_minus_1_sq / sigma_t_sq)) * x0_hat
        posterior_mean = term1 + term2 + term3

        # Posterior variance (Eq. 14)
        posterior_variance = sigma_t_given_t_minus_1_sq * (sigma_t_minus_1_sq / sigma_t_sq)
        
        return posterior_mean, posterior_variance

    def _gradient_penalty(self, real_data, generated_data, t, x_t):
        """Calculates the gradient penalty for discriminator training."""
        batch_size = real_data.size(0)
        alpha = torch.rand(batch_size, 1, 1, 1, device=self.device)
        alpha = alpha.expand_as(real_data)
        
        interpolated = (alpha * real_data + (1 - alpha) * generated_data).requires_grad_(True)
        d_interpolated = self.discriminator(interpolated, t, x_t)
        
        grad_outputs = torch.ones_like(d_interpolated, requires_grad=False)
        
        gradients = torch.autograd.grad(
            outputs=d_interpolated,
            inputs=interpolated,
            grad_outputs=grad_outputs,
            create_graph=True,
            retain_graph=True,
        )[0].view(batch_size, -1)
        
        gradient_penalty = ((gradients.norm(2, dim=1) - 1) ** 2).mean()
        return gradient_penalty

    def forward(self, x0, y):
        """
        A single training step for SelfRDB.
        x0: target image (ground truth)
        y: source image
        """
        batch_size = x0.shape[0]
        
        # 1. Sample a random timestep t
        t = torch.randint(1, self.timesteps + 1, (batch_size,), device=self.device).long()
        
        # 2. Generate noisy sample x_t and actual sample x_{t-1}
        x_t, _ = self.q_sample(x0, y, t)
        x_t_minus_1, _ = self.q_sample(x0, y, t - 1)

        # --- Train Discriminator ---
        self.d_optimizer.zero_grad()

        # Self-consistent recursive estimation for x0_hat
        with torch.no_grad():
            x0_r = torch.zeros_like(x0) # Initialize estimate
            for _ in range(self.num_recursions):
                x0_r = self.generator(x_t, t, y, x0_r)
            x0_hat = x0_r
            
            # Sample synthetic x_{t-1}
            posterior_mean, posterior_var = self.p_mean_variance(x_t, t, y, x0_hat)
            noise = torch.randn_like(x_t)
            x_t_minus_1_hat = posterior_mean + torch.sqrt(posterior_var) * noise

        # Get discriminator logits for real and fake samples
        real_logits = self.discriminator(x_t_minus_1.detach(), t, x_t.detach())
        fake_logits = self.discriminator(x_t_minus_1_hat.detach(), t, x_t.detach())
        
        # Adversarial loss (Eq. 17)
        d_loss_real = -torch.log(torch.sigmoid(real_logits)).mean()
        d_loss_fake = -torch.log(1 - torch.sigmoid(fake_logits)).mean()
        
        # Gradient penalty
        gp = self._gradient_penalty(x_t_minus_1.detach(), x_t_minus_1_hat.detach(), t, x_t.detach())
        
        d_loss = d_loss_real + d_loss_fake + self.lambda_gp * gp
        d_loss.backward()
        self.d_optimizer.step()
        
        # --- Train Generator ---
        self.g_optimizer.zero_grad()
        
        # Self-consistent recursive estimation (with gradients)
        x0_r = torch.zeros_like(x0) # Initialize estimate
        for _ in range(self.num_recursions):
            x0_r = self.generator(x_t, t, y, x0_r)
        x0_hat = x0_r
        
        # Sample synthetic x_{t-1} again for generator training
        posterior_mean, posterior_var = self.p_mean_variance(x_t, t, y, x0_hat)
        noise = torch.randn_like(x_t)
        x_t_minus_1_hat = posterior_mean + torch.sqrt(posterior_var) * noise
        
        # Get discriminator logits for the generated sample
        gen_logits = self.discriminator(x_t_minus_1_hat, t, x_t.detach())

        # Generator loss (Eq. 16)
        g_loss_adv = -torch.log(torch.sigmoid(gen_logits)).mean()
        g_loss_l1 = F.l1_loss(x0_hat, x0)
        
        g_loss = g_loss_adv + self.lambda_l1 * g_loss_l1
        g_loss.backward()
        self.g_optimizer.step()

        return g_loss.item(), d_loss.item(), g_loss_l1.item(), gp.item()

    @torch.no_grad()
    def sample(self, y):
        """
        Inference function to generate a target image from a source image y.
        Implements Algorithm 1 from the paper.
        """
        self.generator.eval()
        batch_size = y.shape[0]
        
        # Start from the noise-added source image (end-point)
        t_T = torch.full((batch_size,), self.timesteps, device=self.device, dtype=torch.long)
        x_t, _ = self.q_sample(torch.zeros_like(y), y, t_T)
        
        for t_val in tqdm(reversed(range(1, self.timesteps + 1)), desc="SelfRDB Sampling", total=self.timesteps):
            t = torch.full((batch_size,), t_val, device=self.device, dtype=torch.long)
            
            # Self-consistent recursive estimation of x0_hat (Eq. 10)
            x0_r = torch.zeros_like(y) # Initialize estimate
            for _ in range(self.num_recursions):
                x0_r = self.generator(x_t, t, y, x0_r)
            x0_hat = x0_r
            
            # Posterior sampling for x_{t-1} (Eq. 11)
            posterior_mean, posterior_var = self.p_mean_variance(x_t, t, y, x0_hat)
            noise = torch.randn_like(x_t) if t_val > 1 else torch.zeros_like(x_t)
            x_t = posterior_mean + torch.sqrt(posterior_var) * noise
            
        self.generator.train()
        # The final sample is x_0, which is the last x_t computed
        return x_t


if __name__ == '__main__':
    # --- Example Usage ---
    
    # Configuration
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    image_size = 128
    in_channels = 1 # e.g., T1-weighted MRI
    out_channels = 1 # e.g., T2-weighted MRI
    batch_size = 4
    
    # Create dummy data
    # In a real scenario, you would use your DataLoader here
    source_images = torch.randn(batch_size, in_channels, image_size, image_size, device=device)
    target_images = torch.randn(batch_size, out_channels, image_size, image_size, device=device)
    
    # Initialize the model
    model = SelfRDB(
        image_size=image_size,
        in_channels=in_channels,
        out_channels=out_channels,
        timesteps=10, # As per the paper for fast sampling
        num_recursions=2, # As per the paper
        device=device
    ).to(device)
    
    # --- Training Loop Example ---
    print("Starting dummy training loop...")
    epochs = 5
    for epoch in range(epochs):
        # In a real scenario, you would loop over your dataset
        g_loss, d_loss, l1_loss, gp = model(target_images, source_images)
        print(f"Epoch {epoch+1}/{epochs} | G_Loss: {g_loss:.4f} | D_Loss: {d_loss:.4f} | L1: {l1_loss:.4f} | GP: {gp:.4f}")
        
    print("\nTraining loop finished.")
    
    # --- Inference Example ---
    print("\nStarting dummy inference...")
    # Use a single source image for sampling
    source_image_for_sampling = source_images[:1] 
    generated_image = model.sample(source_image_for_sampling)
    
    print(f"Inference finished. Generated image shape: {generated_image.shape}")
    # You can now save or visualize the `generated_image`

Related posts, You May like to read

Share on Facebook

Post on X

Save