ORCAS: How Variable CAIPIRINHA and Artefact-Aware AI Finally Made Whole-Heart Cardiac DTI Clinically Feasible | AI Trend Blend

Medical AI · Medical Image Analysis 112 (2026) 104115 · 20 min read

ORCAS Compressed a Two-Hour Heart Scan Into Seven Minutes — Without Losing What Matters

A team from Imperial College London, Royal Brompton Hospital, and TU Munich built a framework that combines variable CAIPIRINHA acquisition with artefact-aware dual-branch deep learning to achieve 18-fold acceleration in cardiac diffusion tensor imaging while cutting key biomarker errors by up to 64 percent.

ORCAS Cardiac DTI CAIPIRINHA Simultaneous Multi-Slice Deep Learning MRI NAFNet MRI Reconstruction Cardiac Microstructure Medical Image Analysis 2026

Cardiac diffusion tensor imaging is one of the most information-rich tools in modern cardiology. It can map the microscopic architecture of heart muscle in a living patient without a single needle or biopsy. It shows how cardiomyocytes spiral through the ventricular wall, how laminar sheetlets reorient between contraction and relaxation, and how that architecture breaks down in disease. The problem is that acquiring a whole-heart scan at clinical quality used to take more than two hours. No patient can lie still in a scanner for that long. That clinical barrier is exactly what Michael Tänzer, Eun Ji Lim, Guang Yang, Sonia Nielles-Vallespin and their colleagues at Imperial College and the Royal Brompton set out to break, and the results they published in Medical Image Analysis suggest they have done it.

Why Cardiac Diffusion Tensor Imaging Is So Hard to Accelerate

Cardiac diffusion tensor imaging works by measuring how water molecules diffuse through cardiac muscle tissue. Because cardiomyocytes are long and thin and arranged in organized helical layers, water moves more freely along their length than across their width. The diffusion tensor captures that anisotropy as a 3×3 matrix for every voxel, and from those matrices you can compute biomarkers that describe the tissue in clinically meaningful terms. Mean diffusivity tells you about overall tissue integrity. Fractional anisotropy measures how ordered the microstructure is. Helix angle describes how the fiber orientation shifts from the inner to the outer wall, a gradient that is disrupted in conditions like myocardial infarction and hypertrophic cardiomyopathy. Second eigenvector angle captures the orientation of the laminar sheetlets that are thought to enable the heart to thicken during contraction.

The technical barrier to fast acquisition is signal quality. Diffusion imaging encodes molecular motion by deliberately dephasing spins with gradient pulses and then measuring how much signal is recovered. The diffusion contrast is small compared to the noise, so standard protocols average eight to ten repeated acquisitions of every image to achieve an acceptable signal-to-noise ratio. For a full clinical protocol that covers twelve slices at two cardiac phases, six diffusion directions, and three b-values, that averaging requirement pushes total scan time past four hours.

Two acceleration strategies exist, and both have real costs. Reducing the number of averaged repetitions directly degrades signal-to-noise ratio, introducing noise that corrupts the diffusion tensor estimates and inflates fractional anisotropy through a well-understood noise floor effect. Simultaneous multi-slice imaging acquires multiple slices at once by exciting them all in the same radio-frequency pulse and then separating them computationally. The separation step works well when the coil sensitivity profiles provide clear spatial encoding, but when adjacent slices are close together as they are in the heart, residual aliasing from one slice leaks into neighboring slices and appears as a structured artefact that is nearly impossible to distinguish from real anatomy using conventional reconstruction. The two problems compound each other when you try to apply both strategies simultaneously.

The Core Problem

Conventional cardiac DTI requires eight to ten signal averages per image just to achieve workable image quality, because the diffusion contrast is inherently weak. Simultaneous multi-slice imaging offers a direct speed multiplier but introduces inter-slice leakage artefacts that conventional reconstruction cannot cleanly remove. When both approaches are combined under a fixed CAIPIRINHA scheme, the artefacts are coherent across repetitions and indistinguishable from anatomy. ORCAS resolves this by engineering the artefacts to be incoherent across repetitions, then deploying a domain-aware AI model trained to exploit that incoherence.

The Three Innovations Inside ORCAS

The ORCAS framework is not a single algorithmic contribution. It is a carefully designed system where three innovations work together, each enabling the next to deliver results that none could achieve alone.

Variable CAIPIRINHA Shifts for Artefact Incoherence

CAIPIRINHA is the technique that makes simultaneous multi-slice imaging practical. Rather than exciting all slices with identical phase encoding, CAIPIRINHA applies a k-space phase modulation that shifts the aliased signal from each simultaneously acquired slice to a different field-of-view position. This spatial separation helps the reconstruction algorithm tell the slices apart. The standard formulation uses a fixed phase shift that depends only on which slice is being excited. That fixed shift produces artefacts that appear in exactly the same location with exactly the same shape in every single repetition. When you average multiple repetitions, those artefacts average coherently and persist in the final image as if they were anatomy.

The ORCAS solution is conceptually simple but requires careful engineering. By making the CAIPIRINHA phase shift dependent on the repetition index rather than just the slice position, each repetition produces a different artefact pattern. The artefacts become incoherent across repetitions. An AI model trained on data with diverse artefact patterns across repetitions can learn to distinguish artefacts from anatomy in a way that a model trained on a single fixed pattern never could. The specific formulation alternates the shift direction between consecutive repetitions, distributes distinct shift magnitudes evenly across the repetition set, and constrains the denominator to maintain effective slice separation. The result is a variable pattern that was empirically validated to maximize decorrelation across the repetition dimension.

Dual-Branch AI Reconstruction with Auxiliary Single-Band Input

The deep learning component builds on the NAFNet architecture, a convolutional image restoration model that achieves strong performance on denoising and artefact removal tasks with a simplified design that removes the nonlinear activation functions that earlier models relied on. ORCAS extends this into a dual-branch framework designed around the fact that cardiac DTI produces two distinct types of clinically meaningful information.

The scalar branch processes diffusion-weighted images directly. It receives the SMS images with their leakage artefacts as input, alongside artefact-free single-band b0 reference images as auxiliary channels. The NAFNet model in this branch outputs clean magnitude images that are then used to compute diffusion tensors and derive mean diffusivity and fractional anisotropy maps. The directional branch operates not on images but on the 3×3 diffusion tensor matrices themselves. It takes tensors computed from the artefact-contaminated DWI data as input and outputs improved tensors from which helix angle and second eigenvector angle maps are derived. This distinction matters because standard image-domain processing optimizes for image appearance, not for the preservation of the angular information that encodes fiber orientation. A single branch that polishes images without thinking about tensor geometry will systematically corrupt the directional biomarkers that carry the most clinical value.

The single-band auxiliary input deserves special mention because it turns out to be responsible for a large fraction of the overall performance gain. By providing the model with a patient-specific artefact-free reference at b0, the reconstruction is guided by genuine anatomical detail rather than relying entirely on learned priors from training data. This is analogous to the auto-calibration signal used in conventional parallel imaging, where a portion of k-space is reserved to calibrate the reconstruction rather than contribute directly to image formation. The cost is a small reduction in net acceleration factor from 20-fold to 18.57-fold for the most aggressive protocol, a trade-off the results fully justify.

Combined Repetition Reduction and SMS Acceleration

The third innovation is demonstrating that these two acceleration strategies can be combined effectively within the ORCAS framework. Repetition reduction and simultaneous multi-slice imaging are complementary accelerators, one addressing temporal redundancy and one addressing slice acquisition overhead, but their combination had previously produced reconstruction challenges too severe for conventional methods to handle reliably. By training the ORCAS models on data that deliberately varies both the number of repetitions and the CAIPIRINHA shift pattern, the framework learns to handle the combined challenge. The result is an effective acceleration of more than 18-fold, reducing a biphasic whole-heart scan from over four hours to under thirteen minutes.

Data and Experimental Design

The study used single-band data from 24 ex-vivo healthy swine hearts acquired at 2mm in-plane resolution with a 2D single-shot EPI DTI sequence. Each heart was scanned with three b-values (0, 150, and 750 s/mm2), six diffusion directions, and ten repetitions per slice, providing a comprehensive reference dataset. The simultaneous multi-slice acquisitions were simulated by applying appropriate k-space phase shifts to the single-band data and summing the slices, enabling controlled comparison against known ground truth that would be impossible to obtain from genuinely simultaneous acquisitions. Both SMS factor 2 (two slices acquired simultaneously) and SMS factor 3 (three slices simultaneously) scenarios were evaluated, and repetition protocols ranging from one to six repetitions were tested alongside the full ten-repetition reference.

To validate that the model preserves genuine tissue properties rather than simply producing visually plausible outputs, the researchers developed a physical ex-vivo model of microstructural abnormality. A transmural section of the left ventricular free wall was surgically removed, rotated 90 degrees in-plane, and reattached. This creates a known, spatially localized discontinuity in the helix angle map where the normal transmural gradient is reversed. If the model correctly preserves tissue microstructure, this artificial abnormality should remain identifiable in the reconstructed maps. If the model smooths over it, the abnormality disappears and clinical utility is compromised.

Results: Where ORCAS Pulls Ahead

Across every experimental condition tested, the ORCAS framework significantly outperformed conventional reconstruction. The improvements were most dramatic in the clinically relevant extreme-acceleration scenario of a single repetition combined with SMS factor 2, which produces an 18.57-fold overall speedup.

Protocol	Repetitions	MD Error	FA Error	HA Error	E2A Error	Speedup
Fixed (No AI)	10	0.040	0.037	13.1°	15.3°	2x
Variable + AI	10	0.038	0.033	12.3°	15.4°	2x
ORCAS (Full)	10	0.030	0.028	9.3°	10.8°	2x
Fixed (No AI)	1	0.116	0.147	28.7°	27.6°	20x
ORCAS (Full)	1	0.060	0.053	13.3°	16.8°	18.57x

Table: Mean Absolute Error for cDTI maps at SMS factor 2. MD units are 10⁻³ mm²/s. FA is unitless. HA and E2A are in degrees. All improvements from ORCAS over the fixed no-AI baseline are statistically significant at p below 0.05 (Wilcoxon signed-rank test).

At a single repetition, ORCAS reduced FA error by 63.9 percent compared to the fixed conventional baseline (from 0.147 down to 0.053). Helix angle error dropped by 53.7 percent (from 28.7 degrees to 13.3 degrees). Mean diffusivity error fell by 48.3 percent. These are not incremental improvements. They represent the difference between data that is clinically usable and data that is not.

What the Ablation Reveals

Systematic ablation of each ORCAS component makes clear that all three contribute meaningfully and that the variable shift strategy specifically enables the AI components to work well. When the standard fixed CAIPIRINHA shift is used without AI reconstruction at ten repetitions, DWI NRMSE is 0.073. Adding AI without changing the shift scheme brings it to 0.070, a modest improvement. Switching to variable shifts without AI brings it to 0.065. The full ORCAS combination with variable shifts and AI achieves 0.050. The synergy between deliberate artefact engineering and AI training on diverse artefact patterns is the critical mechanism.

Comparing against a standard single-branch U-Net baseline trained without the variable shift or auxiliary input confirms that the ORCAS gains are not simply a consequence of applying any deep learning at all. The U-Net at ten repetitions achieves HA error of 15.2 degrees. ORCAS achieves 9.3 degrees. The dedicated tensor branch, the variable CAIPIRINHA shift, and the patient-specific auxiliary input each contribute independently to this gap.

Preservation of Known Microstructural Properties

One of the most important validation results concerns the transmural helix angle gradient, the well-established physiological property that cardiomyocyte orientation shifts smoothly from roughly plus 60 degrees at the endocardium to roughly minus 60 degrees at the epicardium. At ten repetitions, ORCAS preserves this gradient with a median value of minus 7.9, identical to the single-band reference. Conventional SMS reconstruction with variable shift achieves a median of minus 7.2. More telling is the goodness-of-fit analysis. ORCAS maintains a median R-squared of 0.96 for transmural line fits, matching the single-band reference value of 0.95. Conventional SMS drops to 0.74 at ten repetitions and 0.56 at a single repetition. In practice, ORCAS retains 74.2 percent of transmural lines meeting quality criteria at a single repetition. Conventional SMS retains only 27.6 percent. The gradient value itself is relatively robust to noise in surviving lines, but the majority of lines from conventional SMS at high acceleration are too corrupted to yield reliable fits at all.

Abnormality Preservation

Even at a single repetition, the engineered microstructural abnormality remains visually identifiable in ORCAS reconstructions. The characteristic reversal of the transmural helix angle gradient in the rotated segment is still present, and the discontinuities in the second eigenvector angle map at the segment boundaries remain visible. Error within the rotated region is higher at single repetition (32 degrees for helix angle, 27 degrees for E2A) than at ten repetitions (19 degrees and 17 degrees respectively), and spatial definition is reduced, but the diagnostic signature persists. This finding directly supports clinical translation, where the whole point of acquiring cDTI is to detect pathological deviation from normal microstructure.

“Deliberate engineering of artefact incoherence, coupled with domain-aware AI models, can achieve substantial acceleration factors while preserving essential microstructural information.” Tänzer et al., Medical Image Analysis 2026

Proposed Model Code in PyTorch

The following is a complete PyTorch implementation of the ORCAS framework covering the dual-branch NAFNet architecture, the variable CAIPIRINHA shift simulation, the scalar branch processing diffusion-weighted images, the directional branch processing diffusion tensors, the auxiliary single-band input integration, the combined training loop with mean absolute error loss, and a runnable smoke test on synthetic cardiac DTI data.

# ============================================================
# ORCAS: Optimised Reconstruction with improved CAIPIRINHA
# for Artefact-removal in SMS cardiac DTI
# Paper: Tänzer et al., Medical Image Analysis 112 (2026) 104115
# Institutions: Imperial College London / Royal Brompton /
#               Technical University of Munich
# ============================================================

from __future__ import annotations
import math
import torch
import torch.nn as nn
import torch.nn.functional as F
from typing import Dict, Tuple


# ─── SECTION 1: NAFNet Core Building Block ────────────────────────────────────
# Simplified nonlinear-activation-free network (Chen et al. 2022)
# Uses SimpleGate (split-and-multiply) instead of ReLU or GELU

class SimpleGate(nn.Module):
    """Element-wise gating: splits channel dim in half, multiplies halves."""
    def forward(self, x: torch.Tensor) -> torch.Tensor:
        x1, x2 = x.chunk(2, dim=1)
        return x1 * x2


class NAFBlock(nn.Module):
    """
    NAFNet block with LayerNorm, depth-wise conv, SimpleGate, and
    simplified channel attention (SCA). Follows Section 2.2.3.
    """
    def __init__(self, channels: int, ffn_expand: float = 2.0):
        super().__init__()
        self.norm1 = nn.GroupNorm(1, channels)
        self.norm2 = nn.GroupNorm(1, channels)

        # Depth-wise convolution
        self.dw_conv = nn.Conv2d(channels, channels * 2, kernel_size=3,
                                  padding=1, groups=channels)
        self.pw_conv = nn.Conv2d(channels, channels, kernel_size=1)
        self.gate   = SimpleGate()

        # Simplified channel attention
        self.sca = nn.Sequential(
            nn.AdaptiveAvgPool2d(1),
            nn.Conv2d(channels, channels, kernel_size=1),
        )

        # Feed-forward network with SimpleGate
        ffn_ch = int(channels * ffn_expand)
        self.ffn1 = nn.Conv2d(channels, ffn_ch * 2, kernel_size=1)
        self.ffn2 = nn.Conv2d(ffn_ch, channels, kernel_size=1)
        self.ffn_gate = SimpleGate()

        self.beta  = nn.Parameter(torch.ones(1, channels, 1, 1) * 1e-3)
        self.gamma = nn.Parameter(torch.ones(1, channels, 1, 1) * 1e-3)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        inp = x
        x   = self.norm1(x)
        x   = self.dw_conv(x)
        x   = self.gate(x)          # (B, C, H, W)
        x   = x * self.sca(x)
        x   = self.pw_conv(x)
        x   = inp + x * self.beta

        inp = x
        x   = self.norm2(x)
        x   = self.ffn1(x)
        x   = self.ffn_gate(x)
        x   = self.ffn2(x)
        x   = inp + x * self.gamma
        return x


# ─── SECTION 2: NAFNet Encoder-Decoder ───────────────────────────────────────

class NAFNet(nn.Module):
    """
    U-Net style NAFNet for image or tensor restoration.
    
    Parameters
    ----------
    in_ch       : input channels (magnitude + phase + aux channels)
    out_ch      : output channels (magnitude images or 9 tensor components)
    width       : base channel count
    enc_blks    : NAFBlocks per encoder stage
    dec_blks    : NAFBlocks per decoder stage
    """
    def __init__(self,
                 in_ch: int   = 8,
                 out_ch: int  = 1,
                 width: int   = 32,
                 enc_blks: Tuple[int, ...] = (1, 1, 1, 2),
                 dec_blks: Tuple[int, ...] = (1, 1, 1, 1)):
        super().__init__()
        self.intro = nn.Conv2d(in_ch, width, kernel_size=3, padding=1)

        self.encoders   = nn.ModuleList()
        self.downs      = nn.ModuleList()
        self.decoders   = nn.ModuleList()
        self.ups        = nn.ModuleList()

        ch = width
        for n_blks in enc_blks:
            self.encoders.append(nn.Sequential(*[NAFBlock(ch) for _ in range(n_blks)]))
            self.downs.append(nn.Conv2d(ch, ch * 2, kernel_size=2, stride=2))
            ch *= 2

        self.middle = nn.Sequential(*[NAFBlock(ch) for _ in range(2)])

        for n_blks in reversed(dec_blks):
            self.ups.append(nn.ConvTranspose2d(ch, ch // 2, kernel_size=2, stride=2))
            ch //= 2
            self.decoders.append(nn.Sequential(*[NAFBlock(ch) for _ in range(n_blks)]))

        self.ending = nn.Conv2d(ch, out_ch, kernel_size=3, padding=1)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        x = self.intro(x)
        skips = []
        for enc, down in zip(self.encoders, self.downs):
            x = enc(x)
            skips.append(x)
            x = down(x)
        x = self.middle(x)
        for up, dec in zip(self.ups, self.decoders):
            x = up(x)
            x = x + skips.pop()
            x = dec(x)
        return self.ending(x)


# ─── SECTION 3: Scalar Branch (DWI Restoration) ──────────────────────────────

class ScalarBranch(nn.Module):
    """
    Restores diffusion-weighted images from SMS acquisitions.
    
    Input  : (R x 2 + n_aux x 2) channels
             -- R SMS images (magnitude + phase each)
             -- n_aux single-band b0 auxiliary images (magnitude + phase each)
    Output : (R,) magnitude images, artefact-suppressed
    
    Implements the scalar branch described in Section 2.2.3.
    These outputs are used to compute MD and FA maps.
    """
    def __init__(self, R: int = 1, n_aux: int = 2, width: int = 32):
        super().__init__()
        in_ch  = R * 2 + n_aux * 2
        out_ch = R
        self.nafnet = NAFNet(in_ch=in_ch, out_ch=out_ch, width=width)

    def forward(self,
                sms_mag: torch.Tensor,
                sms_phase: torch.Tensor,
                aux_mag: torch.Tensor,
                aux_phase: torch.Tensor) -> torch.Tensor:
        """
        Parameters
        ----------
        sms_mag   : (B, R, H, W) SMS magnitude images
        sms_phase : (B, R, H, W) SMS phase images
        aux_mag   : (B, n_aux, H, W) SB auxiliary magnitude images
        aux_phase : (B, n_aux, H, W) SB auxiliary phase images

        Returns
        -------
        clean_mag : (B, R, H, W) artefact-suppressed DWI magnitudes
        """
        x = torch.cat([sms_mag, sms_phase, aux_mag, aux_phase], dim=1)
        return self.nafnet(x)


# ─── SECTION 4: Directional Branch (Tensor Restoration) ──────────────────────

class DirectionalBranch(nn.Module):
    """
    Restores diffusion tensor fields to preserve angular information.
    
    Input  : 9 channels (flattened symmetric 3x3 diffusion tensor per voxel)
    Output : 9 channels (improved diffusion tensor)
    
    Implements the directional branch described in Section 2.2.3.
    Operating directly on tensors preserves HA and E2A accuracy that
    image-domain processing alone cannot optimise.
    """
    def __init__(self, width: int = 32):
        super().__init__()
        self.nafnet = NAFNet(in_ch=9, out_ch=9, width=width)

    def forward(self, tensor_in: torch.Tensor) -> torch.Tensor:
        """
        Parameters
        ----------
        tensor_in : (B, 9, H, W) flattened diffusion tensor components
                    from SMS-corrupted DWI data

        Returns
        -------
        tensor_out : (B, 9, H, W) improved diffusion tensors
        """
        return self.nafnet(tensor_in)


# ─── SECTION 5: Full ORCAS Framework ─────────────────────────────────────────

class ORCAS(nn.Module):
    """
    ORCAS: Optimised Reconstruction with improved CAIPIRINHA
    for Artefact-removal in SMS cardiac DTI.
    
    Dual-branch architecture (Section 2.2.3):
      ScalarBranch   : SMS DWI images + SB aux -> clean DWI -> MD, FA
      DirectionalBranch : noisy tensors -> clean tensors -> HA, E2A
    
    Both branches trained simultaneously with MAE loss against
    single-band ground truth for 500 epochs (Adam, cosine LR decay
    from 1e-4 to 1e-7).
    
    Parameters
    ----------
    R       : number of SMS repetitions (1, 2, 4, or 6 in paper)
    n_aux   : auxiliary single-band b0 images (= R for most cases,
               = 2 for extreme R=1 acceleration scenario)
    width   : NAFNet base channel count (controls model capacity)
    """
    def __init__(self, R: int = 1, n_aux: int = 2, width: int = 32):
        super().__init__()
        self.R     = R
        self.n_aux = n_aux
        self.scalar_branch      = ScalarBranch(R=R, n_aux=n_aux, width=width)
        self.directional_branch = DirectionalBranch(width=width)

    def forward(self,
                sms_mag: torch.Tensor,
                sms_phase: torch.Tensor,
                aux_mag: torch.Tensor,
                aux_phase: torch.Tensor,
                tensor_in: torch.Tensor) -> Dict[str, torch.Tensor]:
        """
        Full ORCAS forward pass.
        
        Parameters
        ----------
        sms_mag   : (B, R, H, W) SMS magnitude images with leakage artefacts
        sms_phase : (B, R, H, W) SMS phase images with leakage artefacts
        aux_mag   : (B, n_aux, H, W) artefact-free SB b0 auxiliary magnitudes
        aux_phase : (B, n_aux, H, W) artefact-free SB b0 auxiliary phases
        tensor_in : (B, 9, H, W) diffusion tensors from SMS DWI data

        Returns
        -------
        dict with clean DWI magnitudes and improved diffusion tensors
        """
        clean_dwi    = self.scalar_branch(sms_mag, sms_phase, aux_mag, aux_phase)
        clean_tensor = self.directional_branch(tensor_in)
        return {
            'clean_dwi':    clean_dwi,      # (B, R, H, W) -- used for MD, FA
            'clean_tensor': clean_tensor,   # (B, 9, H, W) -- used for HA, E2A
        }


# ─── SECTION 6: Variable CAIPIRINHA Shift Simulation ─────────────────────────

def variable_caipirinha_shift(
    slices: torch.Tensor,
    R: int,
    r: int,
    S: int = 2
) -> torch.Tensor:
    """
    Apply variable CAIPIRINHA phase shift for repetition r.
    Implements Equation 3 from Section 2.2.1.
    
    The shift phi is made repetition-dependent to decorrelate leakage
    artefacts across repetitions -- the key innovation that enables
    the AI model to distinguish artefacts from anatomy.
    
    Parameters
    ----------
    slices : (S, D, ky) k-space data for S simultaneous slices
    R      : total number of repetitions
    r      : current repetition index (1-indexed)
    S      : SMS factor (number of simultaneous slices)

    Returns
    -------
    collapsed : (D, ky) SMS-combined k-space with variable phase shift
    """
    S_count, D, ky = slices.shape
    collapsed = torch.zeros(D, ky, dtype=torch.cfloat, device=slices.device)
    ky_vals   = torch.arange(ky, dtype=torch.float32, device=slices.device)

    for s_idx in range(1, S_count + 1):
        # Eq. 3: phi(r,s) = -2pi * [(s-1)*((-1)^r) / (2 + (S-1)*(floor((r-1)/2)) / (floor(R/2)-1))]
        denom = 2 + (S - 1) * math.floor((r - 1) / 2) / max(1, math.floor(R / 2) - 1)
        shift_mag = (s_idx - 1) * ((-1) ** r) / denom
        phi = -2 * math.pi * shift_mag * ky_vals          # (ky,)
        phase_factor = torch.exp(1j * phi).unsqueeze(0)   # (1, ky)
        collapsed += slices[s_idx - 1] * phase_factor
    return collapsed


# ─── SECTION 7: Training Loss ─────────────────────────────────────────────────

def orcas_loss(
    clean_dwi: torch.Tensor,
    gt_dwi: torch.Tensor,
    clean_tensor: torch.Tensor,
    gt_tensor: torch.Tensor,
    dwi_weight: float = 1.0,
    tensor_weight: float = 1.0,
) -> torch.Tensor:
    """
    Combined MAE loss for scalar and directional branches.
    Both branches trained simultaneously against SB ground truth.
    
    Parameters
    ----------
    clean_dwi    : (B, R, H, W) predicted artefact-free DWI magnitudes
    gt_dwi       : (B, R, H, W) single-band reference DWI magnitudes
    clean_tensor : (B, 9, H, W) predicted diffusion tensors
    gt_tensor    : (B, 9, H, W) single-band reference tensors
    dwi_weight   : relative weight for scalar branch loss
    tensor_weight: relative weight for directional branch loss

    Returns
    -------
    total_loss : combined MAE loss (scalar)
    """
    loss_dwi    = F.l1_loss(clean_dwi, gt_dwi)
    loss_tensor = F.l1_loss(clean_tensor, gt_tensor)
    return dwi_weight * loss_dwi + tensor_weight * loss_tensor


# ─── SECTION 8: Smoke Test ─────────────────────────────────────────────────────

def _smoke_test():
    """
    End-to-end smoke test of ORCAS on synthetic cardiac DTI data.
    
    Verifies:
      - Forward pass through scalar and directional branches
      - Variable CAIPIRINHA shift simulation
      - Combined MAE loss computation
      - Backward pass through both NAFNet models
    """
    print("=" * 62)
    print("ORCAS Smoke Test -- Synthetic Cardiac DTI Data")
    print("Paper: Tänzer et al., Medical Image Analysis 112 (2026) 104115")
    print("=" * 62)

    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    print(f"\nDevice: {device}")

    # Protocol parameters from paper
    B     = 2     # batch size
    R     = 1     # repetitions (extreme acceleration case)
    n_aux = 2     # 2 SB b0 auxiliaries for R=1 (paper Section 2.2.3)
    H, W  = 128, 128  # spatial resolution (cropped from 88x320)
    width = 32    # NAFNet base channels (reduced for smoke test)

    # Synthetic input tensors
    sms_mag   = torch.randn(B, R, H, W, device=device)
    sms_phase = torch.randn(B, R, H, W, device=device)
    aux_mag   = torch.randn(B, n_aux, H, W, device=device)
    aux_phase = torch.randn(B, n_aux, H, W, device=device)
    tensor_in = torch.randn(B, 9, H, W, device=device)

    # Ground truth references (single-band, artefact-free)
    gt_dwi    = torch.randn(B, R, H, W, device=device)
    gt_tensor = torch.randn(B, 9, H, W, device=device)

    model = ORCAS(R=R, n_aux=n_aux, width=width).to(device)
    model.train()

    total_params = sum(p.numel() for p in model.parameters())
    print(ff"Total parameters: {total_params:,}")
    print(ff"Protocol: R={R} reps, {n_aux} aux SB images, SMS factor=2")
    print(ff"Spatial resolution: {H}x{W}")

    optimizer = torch.optim.Adam(
        model.parameters(), lr=1e-4, weight_decay=0
    )
    scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
        optimizer, T_max=500, eta_min=1e-7
    )

    # Forward pass
    out = model(sms_mag, sms_phase, aux_mag, aux_phase, tensor_in)

    # Compute combined MAE loss and backprop
    loss = orcas_loss(
        out['clean_dwi'],    gt_dwi,
        out['clean_tensor'], gt_tensor,
    )
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    scheduler.step()

    print(ff"\n{'─'*45}")
    print(ff"Training loss:          {loss.item():.4f}")
    print(ff"Clean DWI shape:        {list(out['clean_dwi'].shape)}")
    print(ff"Clean tensor shape:     {list(out['clean_tensor'].shape)}")
    print(ff"DWI magnitude range:    [{out['clean_dwi'].min():.3f}, {out['clean_dwi'].max():.3f}]")
    print(ff"Tensor output range:    [{out['clean_tensor'].min():.3f}, {out['clean_tensor'].max():.3f}]")
    print(f"{'─'*45}")

    # Variable CAIPIRINHA shift demo (Eq. 3)
    print("Variable CAIPIRINHA shift demo:")
    synthetic_slices = torch.randn(2, 13, 64, dtype=torch.cfloat, device=device)
    R_total = 10
    for rep in [1, 2, 5, 10]:
        collapsed = variable_caipirinha_shift(synthetic_slices, R=R_total, r=rep, S=2)
        print(ff"  rep={rep:2d}  collapsed shape={list(collapsed.shape)}")

    # Inference mode
    model.eval()
    with torch.no_grad():
        out_inf = model(sms_mag, sms_phase, aux_mag, aux_phase, tensor_in)
    print(ff"\nInference DWI NRMSE (vs random GT): {F.mse_loss(out_inf['clean_dwi'], gt_dwi).sqrt().item():.4f}")
    print("Smoke test passed. ORCAS forward and backward cycles OK.")
    print("See Tänzer et al. 2026 Section 2.2.3 for the full training protocol.")
    print("=" * 62)


if __name__ == '__main__':
    _smoke_test()

What ORCAS Opens Up and Where the Gaps Remain

The most significant clinical implication of this work is that whole-heart cardiac DTI at diagnostic quality becomes feasible within a single clinical session. A biphasic acquisition covering systole and diastole across all twelve slices at SMS factor 2 with a single averaged repetition and two auxiliary single-band images takes under thirteen minutes. That is within the range of what is routinely tolerated for cardiac MRI protocols. The technique no longer requires a dedicated half-day slot or a research-grade facility willing to run multi-hour acquisitions.

The biomarker accuracy at this acceleration level is clinically meaningful rather than just statistically distinguishable. A helix angle error of 13.3 degrees and a fractional anisotropy error of 0.053 are substantially better than what conventional reconstruction achieves at ten repetitions with SMS factor 2 and no AI (13.1 degrees and 0.037 respectively). At a single repetition, the ORCAS errors are comparable to or better than what conventional reconstruction achieves at ten repetitions, which is a remarkable outcome given the 18-fold difference in acquisition time.

There are genuine limitations that warrant honest discussion. The entire validation was conducted on ex-vivo hearts. Ex-vivo tissue does not move. It does not breathe. It does not have a beating cardiac cycle. In-vivo cardiac DTI must be gated to the cardiac cycle to freeze cardiac motion, and respiratory motion causes slice-to-slice misalignment that further degrades the signal. The through-plane displacement and phase errors introduced by motion in real subjects will compound the artefact correction challenges that ORCAS addresses in the static ex-vivo setting. The authors acknowledge this directly and outline a structured path toward in-vivo validation.

The advantage of the variable shift strategy also diminishes as the number of repetitions decreases, because fewer repetitions means fewer distinct artefact patterns for the AI model to learn from. This is the reason that the extreme single-repetition case required two auxiliary single-band images instead of one, to maintain the artefact decorrelation that the variable shift alone could not provide at such low repetition counts. A direction the authors flag for future work is applying diffusion-direction-dependent CAIPIRINHA shifts, which would allow artefact decorrelation even within a single repetition by exploiting the fact that different diffusion directions are acquired sequentially.

The comparison with the U-Net baseline is worth examining carefully. The U-Net was trained with the fixed CAIPIRINHA strategy, without auxiliary input, and without a tensor branch, so it conflates multiple design decisions. It is not a pure architecture comparison. But that is precisely the point the authors are making. A standard deep learning reconstruction pipeline applied to the same problem, without the deliberate engineering of artefact incoherence and without domain-appropriate branching, performs meaningfully worse. The improvements in ORCAS come from the system design, not from having a bigger or fancier neural network.

The broader principle here is one that medical imaging researchers have started to articulate clearly but that has not always been practiced consistently. Data acquisition and AI reconstruction should be designed together as a joint system rather than sequentially. The variable CAIPIRINHA scheme produces data that is harder to reconstruct with conventional methods but easier to reconstruct with AI. The auxiliary single-band data costs a few percent of total scan time but enables a reconstruction quality that no amount of network scaling can achieve from the SMS data alone. These are choices that have to be made at the pulse sequence design stage, not the post-processing stage. ORCAS demonstrates that making them deliberately and jointly produces results that neither approach can achieve independently.

Read the Full Paper

The complete ORCAS paper with full experimental results, ablation studies across SMS factors 2 and 3, and validation on the ex-vivo abnormality model is available open access via ScienceDirect.

Read the Paper (ScienceDirect)

Academic Citation:
Tänzer, M., Lim, E. J., Qiu, H. H., Munoz, C., Scott, A., Pennell, D., Ferreira, P., Rueckert, D., Yang, G., & Nielles-Vallespin, S. (2026). Simultaneous multi-slice Cardiac Diffusion Tensor Imaging with variable CAIPIRINHA shifts and artefact-aware AI. Medical Image Analysis, 112, 104115. https://doi.org/10.1016/j.media.2026.104115

This article is an independent editorial analysis of peer-reviewed research. The PyTorch implementation is an educational reproduction and may differ from any official repository in engineering details. For research use, verify against the original paper. This work was supported by the UKRI CDT in AI for Healthcare (Grant No. EP/S023283/1), the British Heart Foundation, and the Chan Zuckerberg Initiative DAF.