Hierarchical Spatio-temporal Segmentation Network (HSS-Net) for Accurate Ejection Fraction Estimation

Cardiovascular diseases remain the leading cause of death worldwide, making accurate and early diagnosis critical. Among the most vital metrics in cardiac assessment is the Ejection Fraction (EF)—a measure of how much blood the left ventricle pumps out with each contraction. Traditionally, EF is calculated using manual segmentation of echocardiography videos, a process that is both time-consuming and prone to inter-observer variability.

Recent advancements in deep learning for echocardiography video segmentation have paved the way for automated EF estimation. However, many models struggle to translate high segmentation accuracy into reliable EF predictions due to noise, motion artifacts, and limited annotated data. To bridge this gap, researchers from Nanjing University of Science and Technology have introduced the Hierarchical Spatio-temporal Segmentation Network (HSS-Net)—a groundbreaking model that significantly improves EF estimation accuracy by synergizing local detail preservation with global motion understanding.

In this article, we’ll explore how HSS-Net works, why it outperforms existing methods, and what it means for the future of AI-powered cardiac diagnostics.

The Challenge: Why Accurate EF Estimation is So Difficult

Echocardiography, or cardiac ultrasound, is a non-invasive imaging technique used to evaluate heart function. Despite its widespread use, echocardiographic images suffer from several limitations:

Speckle noise and artifacts that obscure anatomical boundaries
Low contrast and blurred edges, especially in the left ventricular endocardium
Complex cardiac motion during the heartbeat cycle
Sparse annotations—only End-Diastolic (ED) and End-Systolic (ES) frames are typically labeled

These challenges make automated segmentation difficult. Even if a model achieves a high Dice score (a common segmentation accuracy metric), small errors in critical regions—such as the apex and base of the left ventricle—can lead to large biases in EF calculation. As shown in Figure 1 of the paper, minor segmentation inaccuracies can result in EF deviations of up to 11.18%, rendering the clinical utility of such models questionable.

Introducing HSS-Net: A Hybrid Architecture for Precision Cardiac Analysis

The Hierarchical Spatio-temporal Segmentation Network (HSS-Net) is designed to overcome the limitations of existing models by combining the strengths of convolutional networks and state-space modeling (Mamba architecture) in a unified, hierarchical framework.

Key Innovation: Hierarchical Design

HSS-Net follows a symmetric encoder-decoder architecture (see Figure 2), but with a crucial difference: it processes data at multiple levels of abstraction.

Low-level stages (Stages 1–2): Use separable convolution blocks to extract fine-grained spatial details from individual frames. This ensures that edges, textures, and subtle anatomical features are preserved.
High-level stages (Stages 3–4): Employ Spatio-temporal Mamba Blocks to model long-range temporal dependencies across multiple frames, capturing the heart’s dynamic motion.

This hierarchical approach avoids two common pitfalls:

Over-reliance on single frames, which ignores motion continuity.
Overuse of multi-frame processing, which can blur fine details.

By balancing local and global processing, HSS-Net achieves superior segmentation consistency and, more importantly, more accurate EF estimation.

Core Component: The Spatio-temporal Cross Scan (STCS) Module

One of the most innovative aspects of HSS-Net is the Spatio-temporal Cross Scan (STCS) module, which enables the model to capture complex motion patterns from multiple perspectives.

How STCS Works

The STCS module applies four distinct scanning modes to the spatio-temporal feature sequence:

Temporal Forward & Backward Scan – Captures motion trends over time at each spatial location.
Spatial Up & Down Scan – Analyzes spatial coherence across the image at each time point.
Spatio-temporal Diagonal Scan – Integrates information across both space and time in a diagonal pattern.
Spatio-temporal Anti-diagonal Scan – Uses a reverse diagonal pattern to break local correlations.

These scans are based on Selective State Space Models (S6), which allow efficient, linear-time processing of long sequences while maintaining high fidelity.

By combining these scanning strategies, STCS effectively models:

Apex motion changes
Lateral wall contraction
Inter-frame consistency
Long-range dependencies

This leads to smoother, more physiologically plausible segmentation masks, reducing jitter and abnormal boundary fluctuations.

Mathematical Formulation of HSS-Net Components

To understand HSS-Net’s inner workings, let’s break down its key equations using LaTeX-formatted math.

1. Separable Convolution Block (Low-level Processing)

For a feature map F_i ∈ R^{T×C_i×H_i×W_i} at stage i ∈ {1,2} , the separable convolution block applies layer normalization (LN), depthwise separable convolution (SC), and a feed-forward network (FFN):

\[ F_i = SC\!\left(\text{LN}(F_i)\right) + F_i \] \[ F_i = \text{FFN}\!\left(\text{LN}(F_i)\right) + F_i \]

This residual design preserves fine details while enhancing edge and texture representation.

2. Spatio-temporal Mamba Block (High-level Processing)

At higher stages i ∈ {3,4} , the feature map is reshaped into a 1D sequence s_i ∈ R^{C_i×T⋅H_i⋅W_i} . The STCS module processes this sequence:

\[ s_i = \text{STCS}(\text{LN}(s_i)) + s_i \] \[ s_i = \text{FFN}(\text{LN}(s_i)) + s_i \]

After processing, the sequence is reshaped back to its original dimensions and passed through upsampling layers in the decoder.

3. Loss Function for Training

HSS-Net is trained using a combination of Dice loss and binary cross-entropy loss, weighted to emphasize segmentation accuracy:

\[ L_{\text{total}} = \alpha \cdot L_{\text{dice}}(P, G) + (1-\alpha) \cdot L_{\text{bce}}(P, G) \]

where:

P : predicted segmentation mask
G : ground truth mask
α=0.8 (empirically chosen to prioritize Dice score)

This hybrid loss ensures robust training even with noisy or sparse labels.

Benchmark Performance: HSS-Net Outperforms State-of-the-Art

The authors evaluated HSS-Net on three public echocardiography datasets:

CAMUS
EchoNet-Pediatric
EchoNet-Dynamic

Results show that HSS-Net achieves state-of-the-art performance in both segmentation and EF estimation.

Table 1: Performance on CAMUS Dataset

METHOD	PARAMS (M)	FLOPS (G)	CORR (%)	BIAS ± STD	DICE	HD95
UNet++	26.9	37.7	81.68	6.05±6.81	91.87	16.16
TransUNet	105.3	38.6	86.22	1.72±6.07	92.73	13.71
HSS-Net (Ours)	31.2	5.6	90.47	2.43±5.02	93.89	11.29

✅ Key Insight: HSS-Net achieves the highest correlation and lowest bias with 6x lower computational cost than TransUNet.

Table 2: Results on Pediatric and Adult Datasets

METHOD	ECHONET-PEDIATRIC (CORR / BIAS)	ECHONET-DYNAMIC (CORR / BIAS)	DICE (AVG)
PKEchoNet	65.04 / 6.23±11.21	75.43 / 4.34±9.50	91.73
VideoMamba	67.34 / 6.39±11.39	78.62 / 4.50±8.29	91.77
HSS-Net (Ours)	76.91 / 1.29±8.68	84.50 / 0.95±6.75	92.29

📈 Improvement: HSS-Net reduces EF estimation bias by up to 80% compared to prior methods.

Ablation Studies: Why Each Component Matters

To validate the design choices, the authors conducted ablation studies.

Table 3: Ablation Study Results

CONFIGURATION	CAMUS (CORR / BIAS)	ECHONET-DYN (CORR / BIAS)	DICE
Image-level (Conv only)	83.48 / 5.28±6.71	74.79 / 6.00±9.34	92.93
Video-level (Mamba only)	80.67 / 4.48±7.04	78.01 / 4.62±7.92	93.04
w/o ST Diagonal Scan	86.69 / 4.05±5.99	79.65 / 6.03±7.70	93.21
w/o ST Anti-diagonal Scan	88.09 / 2.95±5.57	81.11 / 4.85±7.54	93.27
HSS-Net (Full)	90.47 / 2.43±5.02	84.50 / 0.95±6.75	93.89

Key Findings:

Removing diagonal or anti-diagonal scans reduces performance, proving their role in capturing complex spatio-temporal dynamics.
Using only convolutional blocks limits temporal modeling.
Using only Mamba blocks sacrifices fine spatial detail.
The hierarchical design is essential for optimal balance.

Why HSS-Net is a Game-Changer for Clinical Practice

1. Improved EF Estimation Accuracy

HSS-Net reduces EF estimation bias to under 1% on EchoNet-Dynamic, making it suitable for clinical deployment where even small errors can impact treatment decisions.

2. Robustness to Image Quality

Thanks to the STCS module, HSS-Net handles noise, artifacts, and low contrast better than transformer- or CNN-based models.

3. Efficient Computation

With only 31.2M parameters and 5.6G FLOPs, HSS-Net is lightweight enough for real-time inference on consumer-grade GPUs.

4. Generalization Across Populations

It performs well on both pediatric (EchoNet-Pediatric) and adult (EchoNet-Dynamic) datasets, showing strong cross-domain generalization.

Comparison with Other Deep Learning Approaches

METHOD	ARCHITECTURE	TEMPORAL MODELING	EF CORRELATION	COMPUTATIONAL COST
UNet++	CNN	None	~80%	High
TransUNet	Transformer	Self-attention	~86%	Very High
VideoMamba	Mamba	Linear SSM	~78%	Medium
HSS-Net	Hybrid CNN-Mamba	Cross-scan SSM	90.5%	Low

🔍 Takeaway: HSS-Net is the only model that combines high accuracy, low bias, and low computational cost.

Real-World Applications and Future Directions

HSS-Net has the potential to transform:

Telemedicine: Enable remote EF analysis in underserved areas.
Pediatric Cardiology: Automate assessment in children, where image variability is high.
Longitudinal Monitoring: Track EF changes over time with consistent, objective measurements.

Future work could extend HSS-Net to:

3D echocardiography
Multi-view fusion (A2C + A4C)
Integration with electronic health records (EHR)

Conclusion: The Future of AI in Cardiac Imaging is Hierarchical

The Hierarchical Spatio-temporal Segmentation Network (HSS-Net) represents a major leap forward in automated echocardiography analysis. By intelligently combining local detail extraction with global motion modeling, it delivers unprecedented accuracy in Ejection Fraction estimation—the gold standard for assessing heart function.

With superior performance, robustness to noise, and low computational demands, HSS-Net is not just a research breakthrough—it’s a practical solution ready for clinical integration.

Call to Action: Explore the Code and Join the Revolution

Want to try HSS-Net in your own research or clinical workflow?

👉 Visit the official GitHub repository: https://github.com/DF-W/HSS-Net

The code is open-source, well-documented, and easy to adapt for new datasets or hardware setups.

Have questions?
Drop a comment below or reach out to the authors via email:

Dongfang Wang: dongfangwang@njust.edu.cn
Jian Yang: csjyang@njust.edu.cn

Let’s work together to bring AI-powered cardiac care to every patient, everywhere.

Here is the end-to-end Python code for the Hierarchical Spatio-temporal Segmentation Network (HSS-Net) as described in the paper.

# HSS-Net: Hierarchical Spatio-temporal Segmentation Network
# This code implements the HSS-Net model as described in the paper:
# "Hierarchical Spatio-temporal Segmentation Network for Ejection Fraction Estimation in Echocardiography Videos"
# by Dongfang Wang, Jian Yang, Yizhe Zhang, and Tao Zhou.
#
# The implementation requires PyTorch and the mamba-ssm package.
# You can install mamba-ssm via pip: pip install mamba-ssm

import torch
import torch.nn as nn
from mamba_ssm import Mamba

class HSSNet(nn.Module):
    """
    Main class for the Hierarchical Spatio-temporal Segmentation Network (HSS-Net).
    This network combines convolutional layers for low-level spatial details and
    Mamba blocks for high-level spatio-temporal modeling.
    """
    def __init__(self, in_channels=1, out_channels=1, dims=[32, 64, 128, 256]):
        """
        Initializes the HSS-Net model.
        Args:
            in_channels (int): Number of input channels (e.g., 1 for grayscale).
            out_channels (int): Number of output channels (e.g., 1 for binary segmentation).
            dims (list): A list of channel dimensions for each stage of the network.
        """
        super().__init__()

        # --- Encoder ---
        # Initial patch embedding layer
        self.patch_embed = nn.Conv2d(in_channels, dims[0], kernel_size=4, stride=4)

        # Stage 1: Separable Convolution Blocks for low-level features
        self.encoder1 = nn.Sequential(
            SeparableConvBlock(dims[0], dims[0]),
            SeparableConvBlock(dims[0], dims[0])
        )
        self.down1 = nn.Conv2d(dims[0], dims[1], kernel_size=2, stride=2)

        # Stage 2: More Separable Convolution Blocks
        self.encoder2 = nn.Sequential(
            SeparableConvBlock(dims[1], dims[1]),
            SeparableConvBlock(dims[1], dims[1])
        )
        self.down2 = nn.Conv2d(dims[1], dims[2], kernel_size=2, stride=2)

        # Stage 3: Spatio-temporal Mamba Blocks for high-level features
        self.encoder3 = nn.Sequential(
            SpatioTemporalMambaBlock(dims[2]),
            SpatioTemporalMambaBlock(dims[2])
        )
        self.down3 = nn.Conv2d(dims[2], dims[3], kernel_size=2, stride=2)

        # Stage 4: More Spatio-temporal Mamba Blocks
        self.encoder4 = nn.Sequential(
            SpatioTemporalMambaBlock(dims[3]),
            SpatioTemporalMambaBlock(dims[3])
        )

        # --- Decoder ---
        # Stage 4 Decoder
        self.up4 = nn.ConvTranspose2d(dims[3], dims[2], kernel_size=2, stride=2)
        self.decoder4 = nn.Sequential(
            SpatioTemporalMambaBlock(dims[2]),
            SpatioTemporalMambaBlock(dims[2])
        )

        # Stage 3 Decoder
        self.up3 = nn.ConvTranspose2d(dims[2], dims[1], kernel_size=2, stride=2)
        self.decoder3 = nn.Sequential(
            SpatioTemporalMambaBlock(dims[1]),
            SpatioTemporalMambaBlock(dims[1])
        )

        # Stage 2 Decoder
        self.up2 = nn.ConvTranspose2d(dims[1], dims[0], kernel_size=2, stride=2)
        self.decoder2 = nn.Sequential(
            SeparableConvBlock(dims[0], dims[0]),
            SeparableConvBlock(dims[0], dims[0])
        )

        # Stage 1 Decoder
        self.up1 = nn.ConvTranspose2d(dims[0], dims[0] // 2, kernel_size=4, stride=4)
        self.decoder1 = nn.Sequential(
            SeparableConvBlock(dims[0] // 2, dims[0] // 2)
        )

        # Final output convolution
        self.final_conv = nn.Conv2d(dims[0] // 2, out_channels, kernel_size=1)

    def forward(self, x):
        """
        Forward pass of the HSS-Net.
        Args:
            x (torch.Tensor): Input tensor of shape (B, T, C, H, W),
                              where B=batch, T=frames, C=channels, H=height, W=width.
        Returns:
            torch.Tensor: The segmentation mask of shape (B, T, C_out, H, W).
        """
        B, T, C, H, W = x.shape
        x = x.view(B * T, C, H, W)

        # --- Encoder Path ---
        e1 = self.patch_embed(x)
        e1 = self.encoder1(e1)

        e2 = self.down1(e1)
        e2 = self.encoder2(e2)

        e3 = self.down2(e2)
        # Reshape for Mamba blocks (B*T, C, H, W) -> (B, T, C, H, W)
        _, c3, h3, w3 = e3.shape
        e3 = e3.view(B, T, c3, h3, w3)
        e3 = self.encoder3(e3)

        e4 = self.down3(e3.contiguous().view(B * T, c3, h3, w3))
        _, c4, h4, w4 = e4.shape
        e4 = e4.view(B, T, c4, h4, w4)
        e4 = self.encoder4(e4)

        # --- Decoder Path ---
        d4 = self.up4(e4.contiguous().view(B * T, c4, h4, w4))
        _, c_d4, h_d4, w_d4 = d4.shape
        d4 = d4.view(B, T, c_d4, h_d4, w_d4)
        d4 = self.decoder4(d4 + e3) # Skip connection

        d3 = self.up3(d4.contiguous().view(B * T, c_d4, h_d4, w_d4))
        _, c_d3, h_d3, w_d3 = d3.shape
        d3 = d3.view(B, T, c_d3, h_d3, w_d3)
        # The paper seems to imply Mamba blocks here, matching encoder stages 3 & 4
        d3 = self.decoder3(d3)

        # Reshape back to 2D for convolutional blocks
        d3_2d = d3.contiguous().view(B * T, c_d3, h_d3, w_d3)
        d2 = self.up2(d3_2d)
        d2 = self.decoder2(d2 + e2) # Skip connection

        d1 = self.up1(d2)
        # For the final stage, the skip connection is slightly different due to patch embed
        # We need to handle the channel mismatch
        e1_resized_for_skip = e1 # Assuming dimensions match after up1
        d1 = self.decoder1(d1)

        out = self.final_conv(d1)
        out = torch.sigmoid(out) # Apply sigmoid for binary segmentation

        # Reshape back to (B, T, C_out, H, W)
        out = out.view(B, T, self.final_conv.out_channels, H, W)
        return out

class SeparableConvBlock(nn.Module):
    """
    Separable Convolution Block, inspired by MobileNetV2.
    It consists of a depthwise convolution followed by a pointwise convolution.
    """
    def __init__(self, in_channels, out_channels):
        super().__init__()
        self.block = nn.Sequential(
            nn.LayerNorm([in_channels, 256 // (2**(int(in_channels/32)+1)), 256 // (2**(int(in_channels/32)+1))]), # Assuming input size 256x256
            nn.Conv2d(in_channels, in_channels, kernel_size=3, padding=1, groups=in_channels, bias=False), # Depthwise
            nn.Conv2d(in_channels, out_channels, kernel_size=1, bias=False), # Pointwise
            nn.GELU(),
            nn.LayerNorm([out_channels, 256 // (2**(int(in_channels/32)+1)), 256 // (2**(int(in_channels/32)+1))]),
            FeedForward(out_channels)
        )

    def forward(self, x):
        return x + self.block(x)

class SpatioTemporalMambaBlock(nn.Module):
    """
    Spatio-temporal Mamba Block.
    This block uses the Spatio-temporal Cross Scan (STCS) module to capture
    long-range dependencies in video data.
    """
    def __init__(self, dim, d_state=16, d_conv=4, expand=2):
        super().__init__()
        self.norm1 = nn.LayerNorm(dim)
        self.stcs = STCS(dim, d_state=d_state, d_conv=d_conv, expand=expand)
        self.norm2 = nn.LayerNorm(dim)
        self.ffn = FeedForward(dim)

    def forward(self, x):
        """
        Args:
            x (torch.Tensor): Input of shape (B, T, C, H, W)
        """
        B, T, C, H, W = x.shape
        # Flatten spatial dimensions and permute for Mamba: (B, T, C, H, W) -> (B, T*H*W, C)
        x_flat = x.flatten(3).permute(0, 1, 3, 2).reshape(B, T * H * W, C)

        # Apply STCS
        x_mamba = self.stcs(self.norm1(x_flat), T, H, W)
        x = x_flat + x_mamba

        # Apply FFN
        x = x + self.ffn(self.norm2(x))

        # Reshape back to original: (B, T*H*W, C) -> (B, T, C, H, W)
        x = x.reshape(B, T, H * W, C).permute(0, 1, 3, 2).reshape(B, T, C, H, W)
        return x


class STCS(nn.Module):
    """
    Spatio-temporal Cross Scan (STCS) Module.
    This module applies Mamba along four different scan directions to capture
    comprehensive spatio-temporal relationships.
    """
    def __init__(self, dim, d_state=16, d_conv=4, expand=2):
        super().__init__()
        self.dim = dim
        self.mamba = Mamba(
            d_model=dim,
            d_state=d_state,
            d_conv=d_conv,
            expand=expand,
        )

    def forward(self, x, T, H, W):
        """
        Args:
            x (torch.Tensor): Input of shape (B, L, C) where L = T*H*W
            T, H, W (int): Temporal, Height, and Width dimensions.
        """
        B, L, C = x.shape
        assert L == T * H * W

        # Reshape to (B, T, H, W, C) for easier manipulation
        x_reshaped = x.reshape(B, T, H, W, C)

        # 1. Temporal Scan (Forward & Backward)
        # Scan along the T dimension
        temporal_scan = x_reshaped.permute(0, 2, 3, 1, 4).contiguous().view(B * H * W, T, C)
        temporal_out = self.mamba(temporal_scan) + self.mamba(temporal_scan.flip(dims=[1])).flip(dims=[1])
        temporal_out = temporal_out.view(B, H, W, T, C).permute(0, 3, 1, 2, 4)

        # 2. Spatial Scan (Forward & Backward)
        # Scan along the H dimension (as an example, could be W or combined H*W)
        spatial_scan = x_reshaped.permute(0, 1, 3, 2, 4).contiguous().view(B * T * W, H, C)
        spatial_out = self.mamba(spatial_scan) + self.mamba(spatial_scan.flip(dims=[1])).flip(dims=[1])
        spatial_out = spatial_out.view(B, T, W, H, C).permute(0, 1, 3, 2, 4)

        # 3. Spatio-temporal Diagonal Scan
        # This is a conceptual implementation. A more rigorous one might involve custom CUDA kernels.
        # We simulate it by shifting and scanning.
        diag_scan_list = []
        for t in range(T):
            # Shift each frame's rows
            shifted = torch.roll(x_reshaped[:, t, :, :, :], shifts=t, dims=1)
            diag_scan_list.append(shifted)
        diag_scan_tensor = torch.stack(diag_scan_list, dim=1)
        diag_scan_flat = diag_scan_tensor.permute(0, 2, 3, 1, 4).contiguous().view(B * H * W, T, C)
        diag_out_flat = self.mamba(diag_scan_flat) + self.mamba(diag_scan_flat.flip(dims=[1])).flip(dims=[1])
        diag_out_tensor = diag_out_flat.view(B, H, W, T, C).permute(0, 3, 1, 2, 4)
        # Un-shift to restore original alignment
        diag_out_list = []
        for t in range(T):
             unshifted = torch.roll(diag_out_tensor[:, t, :, :, :], shifts=-t, dims=1)
             diag_out_list.append(unshifted)
        diag_out = torch.stack(diag_out_list, dim=1)


        # 4. Spatio-temporal Anti-diagonal Scan
        anti_diag_scan_list = []
        for t in range(T):
            shifted = torch.roll(x_reshaped[:, t, :, :, :], shifts=-t, dims=1)
            anti_diag_scan_list.append(shifted)
        anti_diag_scan_tensor = torch.stack(anti_diag_scan_list, dim=1)
        anti_diag_scan_flat = anti_diag_scan_tensor.permute(0, 2, 3, 1, 4).contiguous().view(B * H * W, T, C)
        anti_diag_out_flat = self.mamba(anti_diag_scan_flat) + self.mamba(anti_diag_scan_flat.flip(dims=[1])).flip(dims=[1])
        anti_diag_out_tensor = anti_diag_out_flat.view(B, H, W, T, C).permute(0, 3, 1, 2, 4)
        anti_diag_out_list = []
        for t in range(T):
             unshifted = torch.roll(anti_diag_out_tensor[:, t, :, :, :], shifts=t, dims=1)
             anti_diag_out_list.append(unshifted)
        anti_diag_out = torch.stack(anti_diag_out_list, dim=1)


        # Combine the outputs from all scans
        combined_out = (temporal_out + spatial_out + diag_out + anti_diag_out) / 4.0
        return combined_out.flatten(1, 3)


class FeedForward(nn.Module):
    """
    A simple FeedForward Network block.
    """
    def __init__(self, dim, hidden_dim_ratio=4):
        super().__init__()
        hidden_dim = int(dim * hidden_dim_ratio)
        self.net = nn.Sequential(
            nn.Linear(dim, hidden_dim),
            nn.GELU(),
            nn.Linear(hidden_dim, dim),
        )
    def forward(self, x):
        return self.net(x)

# Example Usage
if __name__ == '__main__':
    # Create a dummy input tensor
    # B=2, T=10 frames, C=1 (grayscale), H=256, W=256
    dummy_video = torch.randn(2, 10, 1, 256, 256)

    # Initialize the model
    model = HSSNet(in_channels=1, out_channels=1)

    # Get the model output
    print("Initializing HSS-Net...")
    output = model(dummy_video)

    # Print input and output shapes
    print(f"Input shape: {dummy_video.shape}")
    print(f"Output shape: {output.shape}")

    # Check if the output shape is as expected
    assert dummy_video.shape == output.shape, "Output shape does not match input shape!"
    print("\nModel forward pass successful!")

    # --- Loss Function (as described in the paper) ---
    def dice_loss(pred, target, smooth=1.):
        pred = pred.contiguous()
        target = target.contiguous()
        intersection = (pred * target).sum(dim=2).sum(dim=2)
        loss = (1 - ((2. * intersection + smooth) / (pred.sum(dim=2).sum(dim=2) + target.sum(dim=2).sum(dim=2) + smooth)))
        return loss.mean()

    def total_loss(pred, target, alpha=0.8):
        bce = nn.BCELoss()
        loss_bce = bce(pred, target)
        loss_dice = dice_loss(pred, target)
        return alpha * loss_dice + (1 - alpha) * loss_bce

    # --- Example loss calculation ---
    dummy_target = torch.randint(0, 2, (2, 10, 1, 256, 256)).float()
    loss = total_loss(output, dummy_target)
    print(f"Calculated Loss: {loss.item()}")

Related posts, You May like to read