Revolutionizing Breast Cancer Detection: How AI-Powered 3D Ultrasound Navigation Is Transforming Early Diagnosis

Introduction: The Critical Challenge in Breast Cancer Screening

Breast cancer remains the leading cause of cancer-related deaths among women worldwide, accounting for 15.5% of all female cancer fatalities according to 2024 global statistics. With incidence rates rising particularly in low and middle-income regions, the need for accurate, accessible early detection has never been more urgent. While mammography has long served as the cornerstone of breast cancer screening, it faces significant limitations—especially for the 50% of women globally who have dense breast tissue, a figure that exceeds 60-70% in Asian populations.

Handheld ultrasound (HHUS) has emerged as an indispensable complementary tool, offering non-radiative, cost-effective, and real-time imaging capabilities that excel at differentiating cystic from solid masses. However, conventional 2D ultrasound scanning suffers from three critical weaknesses:

Operator-dependent acquisition leading to inconsistent image quality

Subjective 2D interpretation varying significantly between sonographers

Imprecise clock-face annotation complicating longitudinal follow-up

These limitations create serious clinical consequences. Inaccurate lesion localization can lead to enlarged excision volumes, elevated recurrence risks post-surgery, and compromised treatment planning. A groundbreaking study published in Medical Image Analysis (2026) introduces an innovative solution: a navigation-guided 3D breast ultrasound scanning and reconstruction system that seamlessly integrates optical tracking, foundation AI models, and intelligent spatiotemporal analysis.

Understanding the Technology: Core Components of the Navigation-Guided System

The Hardware Architecture

The proposed system represents a sophisticated fusion of clinical ultrasound equipment and precision tracking technology. At its core, the setup comprises three integrated modules:

Component	Specifications	Function
Clinical Ultrasound Console	Eagus R9 with L9-3U linear probe (2.5-9.0 MHz, 40mm depth)	High-resolution B-mode imaging
Infrared Optical Tracker	Polaris system (NDI), 60Hz maximum frequency	Real-time probe pose tracking
Mobile Workstation	Intel Core i9-12900H, 32GB RAM, NVIDIA RTX 3080Ti	Real-time processing and reconstruction

The optical tracker provides a pyramidal measurement volume extending to 3.0 meters, mounted approximately 1.5 meters from the patient’s breast. This configuration enables comprehensive breast coverage without restricting sonographer movement.

Spatial and Temporal Calibration: The Foundation of Accuracy

Achieving precise 3D reconstruction requires meticulous calibration. The system employs an improved N-wire phantom method to compute the rigid similarity transform T_IP that maps image coordinates to probe coordinates:

\[ \mathbf{x}_{t_i} = \mathbf{O}_V^{T} \cdot \mathbf{I}_O^{T} \cdot \mathbf{t}_i \cdot \begin{pmatrix} u f_x \\ u f_y \\ 0 \\ 1 \end{pmatrix} \]

Where:

x_ti represents the physical coordinate of pixel p(u_fx,u_fy) at time t_i
I_OT_tiis the time-varying transform from image to optical tracker frame
O_VT is the custom alignment transform to the reconstructed volume

Temporal calibration addresses synchronization between the optical tracker (20Hz) and ultrasound frame-grabber (50fps). By maximizing the cross-correlation function of normalized position sequences, the system achieves temporal alignment with errors below 3.8ms—well under the 40ms threshold that would compromise volume estimation accuracy.

BUS-SAM-2: Fine-Tuned Foundation Model for Lesion Segmentation

The Power of Prompt-Driven Segmentation

Central to the system’s diagnostic capability is BUS-SAM-2 (Breast Ultrasound Segment Anything Model 2), a fine-tuned variant of Meta’s SAM-2 optimized for breast ultrasound imagery. Trained on a curated 4,000-image multi-center dataset spanning Poland, Brazil, Egypt, and China, this foundation model enables precise, prompt-driven segmentation of both nipples and lesions.

The segmentation process operates through a memory-augmented architecture:

\[ S_{i}^{t} = f_{\theta}^{\mathrm{BUS\!-\!SAM\!-\!2}} \!\left( I_{i}^{t},\, C_{t},\, M_{i}^{t-1} \right) \]

Where the memory bank M updates via lightweight transformer with streaming attention:

\[ M_{t}^{\,i} = g_{\phi}\!\left( S_{t}^{\,i}, M_{t-1}^{\,i} \right) \]

This design maintains temporal coherence without quadratic computational cost, enabling real-time processing across arbitrary-length sequences.

Key Advantages Over Traditional Methods

Minimal manual intervention: Requires only sparse positive/negative point prompts
Automatic propagation: Forward and reverse tracking terminates when Dice overlap falls below 0.1 across 3 successive frames
Subsequence extraction: Isolates clinically relevant segments from continuous scanning

Hybrid Lesion-Informed Spatiotemporal Transformer (HLST)

Mimicking Clinical Decision-Making

The diagnostic intelligence of the system resides in the Hybrid Lesion-informed Spatiotemporal Transformer (HLST), a novel architecture that replicates how experienced radiologists synthesize information across multiple ultrasound frames. Rather than relying on single-image classification—a major limitation of existing computer-aided diagnosis systems—HLST aggregates intra-lesional and peri-lesional dynamics across the entire temporal sequence.

The architecture processes inputs through three stages:

1. Dual-Stream Spatial Encoding Frame-wise representations Fti∈RD are extracted using a CNN-Transformer hybrid, with embeddings selectively aggregated from diagnostically relevant regions while suppressing irrelevant background.

2. Temporal Longformer Processing The sequence of embeddings F_b:b+k−1 ∈ R^{k × D} is prepended with a learnable classification token τ ∈ R^D :

\[ \tilde{F} = \left[ \tau \, ; \, F_{b:b+k-1} \right] \in \mathbb{R}^{(k+1) \times D} \]

A Longformer encoder applies sparse-dense attention—combining local windowed attention on frame tokens with global attention on the classification token. This hybrid strategy reduces complexity from O(n²) to O(n) , accommodating clips of arbitrary duration.

3. Malignancy Prediction The contextualized classification embedding z_cls feeds into a lightweight MLP, with training supervised by class-balanced focal loss:

\[ L_{\mathrm{cls}} = – \alpha\, y \,(1-\hat{y})^{\gamma}\,\log(\hat{y}) – (1-\alpha)\,(1-y)\,\hat{y}^{\gamma}\,\log(1-\hat{y}) \]

This loss adaptively emphasizes hard-to-classify lesions while maintaining robustness against class imbalance common in breast imaging datasets.

Breakthrough Performance Results

Metric	HLST (Proposed)	Best Alternative	Improvement
Accuracy	86.1%	79.6%	+6.5%
Precision	84.9%	81.3%	+3.6%
Recall	92.7%	85.7%	+7.0%
F1-Score	88.5%	83.2%	+5.3%
AUC	86.8%	82.7%	+4.1%

Table: Performance comparison on BUV dataset (186 breast ultrasound videos). HLST significantly outperforms TimeSformer, VideoSwin, UniFormerV2, C3D, and SlowFast architectures (p < 0.05).

Automated Nipple-Centric Localization: Eliminating Operator Dependency

Geometry-Adaptive Clock Projection

One of the system’s most clinically valuable innovations is automated standardized spatial annotation without requiring patient-attached fiducials or pre-marked landmarks. The method establishes a breast-specific polar coordinate system through:

Step 1: Craniocaudal Direction Estimation The system requires sonographers to begin with a standardized downward probe motion, enabling computation of the body axis:

\[ V_{nc} = \frac{\kappa}{|T|^{2}} \left( \sum_{i=1}^{\kappa |T|/2} IV_{pt_i} – \sum_{i=\kappa |T|/2 + 1}^{\kappa |T|} IV_{pt_i} \right) \]

Step 2: Ellipsoid Fitting for Projection Plane The breast surface is modeled as a flattened ellipsoid with parameters Θ_e=(χ,a,b,c,α,β,γ) , optimized to minimize geometric error while ensuring all lesion voxels lie within the volume:

\[ \inf_{P_e} \sum_{\nu_{s_i} \in \widetilde{\mathcal{V}}_{s}^{+}} E_2\!\left(\nu_{s_i}; \Theta_e\right) \]

Step 3: Clock-Face Orientation Calculation For each lesion l_i, the clockwise nipple-centric angle is computed as:

\[ \phi_{l}^{\,i} = \frac{\operatorname{atan2}\!\left( \left( \mathbf{V}_{n l}^{\,i\,\prime} \times \tilde{\mathbf{V}}_{n c}^{\,\prime} \right) \cdot \tilde{\mathbf{V}}_{n p}, \; \tilde{\mathbf{V}}_{n c}^{\,\prime} \cdot \mathbf{V}_{n l}^{\,i\,\prime} \right)}{2\pi} \]

This yields standardized clock-face positions (1-12 o’clock) comparable across examinations and institutions.

Phantom Validation: Precision Meets Practice

Validation on three breast phantoms (designated H, S, and C) demonstrated exceptional accuracy against CT reference standards:

Parameter	Correlation (r)	p-value	Median Absolute Error
Distance to nipple	>0.99	<0.0001	0.35 mm
3D lesion size	>0.97	<0.0001	1.17 × 0.70 × 0.83 mm
Clock-face angle	1.00	<0.0001	0.77°

Table: Phantom study results showing high correlation with CT ground truth across morphological and positional parameters.

The mean Hausdorff distance of 2.39mm and center distance of 0.85mm between ultrasound and CT reconstructions confirm clinical-grade precision. Notably, the system maintains this accuracy across deformable phantoms, with intra-observer CV ≤4.09% and inter-observer ICC ≥0.953—surpassing existing systems by substantial margins.

Clinical Impact and Real-World Performance

Seamless Workflow Integration

The system has been implemented as BreastLesion3DLocalizer, a 3D Slicer extension module designed for routine clinical deployment. Processing efficiency metrics demonstrate practical viability:

Segmentation speed: 42 frames per second
Reconstruction speed: 74 frames per second
Total processing time: ~2 minutes for 2,000-frame sequence (including 15s segmentation, 30s reconstruction, 35s localization)

Clinical Study Outcomes

Evaluation across 43 female patients (86 breasts) at Shanghai Sixth People’s Hospital revealed:

Median clock-face discrepancy: 0 hours (perfect agreement)
Mean clock-face discrepancy: 0.51 hours
2D size median difference: 0.7mm × 0.6mm
BI-RADS concordance: 95.65% (threshold ≥4a), 93.48% (threshold ≥4b)

Advantages Over Existing Technologies

Compared to Automated Breast Volume Scanning (ABVS)

Feature	ABVS	Navigation-Guided HHUS
Patient comfort	Requires compression	Contact-only scanning
Scanning range	Limited field of view	Full breast coverage
Examination time	Prolonged	Comparable to routine
Cost	High equipment investment	Leverages existing probes
Workflow disruption	Requires dedicated room	Integrates seamlessly

Compared to Conventional Freehand Systems

The proposed system eliminates critical limitations of previous tracked ultrasound approaches:

No patient-mounted sensors required (unlike Šroubek et al., 2019)
Arbitrary tracker positioning—no need for bed-parallel alignment
Adaptive to any breast geometry without anisotropic scaling to generic models
Fully automated 3D visualization with quantitative morphometrics

Future Directions and Technical Evolution

While current results are clinically compelling, ongoing research targets several enhancement pathways:

Deformation Compensation Integration of RGB-D cameras for real-time breast surface monitoring could enable pixel-wise deformation correction, recovering compression-free anatomy through probe pose adjustment and respiratory compensation.

Marker-less Tracking Foundation models for 6DOF pose estimation (e.g., FoundationPose) may eventually substitute optical trackers, reducing system cost while maintaining sub-millimeter accuracy.

Doppler Integration Extension to 3D power Doppler reconstruction would enable comprehensive vascular assessment, enriching diagnostic information for lesion characterization.

Automated Prompting Concept prompt engineering could eliminate remaining manual intervention in lesion segmentation, progressing toward fully autonomous screening.

Conclusion: A New Standard for Breast Ultrasound

The navigation-guided 3D breast ultrasound system represents a paradigm shift in early cancer detection—transforming operator-dependent 2D imaging into standardized, quantitative, 3D diagnostic intelligence. By achieving 86.1% diagnostic accuracy while eliminating the variability inherent in conventional practice, this technology addresses critical gaps in the breast cancer care continuum.

For healthcare institutions, the system offers:

Enhanced diagnostic confidence through foundation-model-augmented analysis
Standardized follow-up protocols with precise spatial referencing
Improved biopsy guidance and preoperative planning capabilities
Reduced operator training burden through intelligent automation

As breast cancer incidence continues rising globally, technologies that combine clinical accessibility with diagnostic excellence become essential infrastructure. This AI-powered navigation system demonstrates that sophisticated 3D reconstruction and intelligent sequence analysis can coexist with the efficiency and patient comfort of handheld ultrasound—pointing toward a future where early detection is both universally available and consistently accurate.

Ready to explore how AI is transforming medical imaging? Subscribe to our newsletter for the latest breakthroughs in diagnostic technology, or share this article with healthcare professionals interested in next-generation ultrasound systems. Have questions about 3D breast imaging implementation? Join the discussion in our comments section below—our medical technology experts respond to every inquiry.

This article is based on peer-reviewed research published in Medical Image Analysis (Volume 110, 2026) by Zhang et al. from Shanghai Jiao Tong University and Shanghai Sixth People’s Hospital.

Here is a comprehensive end-to-end implementation of the navigation-guided 3D breast ultrasound system based on the research paper. This will include all major components: data acquisition, calibration, segmentation with BUS-SAM-2, the HLST classifier, 3D reconstruction, and nipple-centric localization.

"""
Navigation-Guided 3D Breast Ultrasound Scanning and Reconstruction System
Based on: Zhang et al., Medical Image Analysis 110 (2026) 103965

This implementation includes:
1. Spatial and Temporal Calibration
2. BUS-SAM-2 Fine-tuning and Inference
3. Hybrid Lesion-informed Spatiotemporal Transformer (HLST)
4. 3D Volume Reconstruction with Hole-filling
5. Nipple-centric Lesion Localization
"""

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
import numpy as np
import cv2
from typing import List, Tuple, Dict, Optional
from dataclasses import dataclass
from scipy.spatial.transform import Rotation as R
from scipy.optimize import minimize
from skimage import measure
import open3d as o3d
from collections import deque
import math


# =============================================================================
# CONFIGURATION
# =============================================================================

@dataclass
class SystemConfig:
    """System configuration parameters"""
    # Hardware specs
    image_width: int = 650
    image_height: int = 600
    fps: int = 20  # Sampling rate
    tracker_freq: int = 60  # Hz
    
    # Ultrasound parameters
    imaging_depth: float = 40.0  # mm
    pixel_spacing: float = 0.1  # mm/pixel (example)
    
    # Reconstruction parameters
    voxel_spacing: Tuple[float, float, float] = (0.5, 0.5, 0.5)  # mm
    hole_fill_iterations: int = 50
    
    # SAM-2 parameters
    sam2_model: str = "tiny"
    sam2_checkpoint: str = "sam2_tiny.pth"
    
    # HLST parameters
    hlst_hidden_dim: int = 512
    hlst_num_heads: int = 8
    hlst_num_layers: int = 4
    hlst_window_size: int = 32
    hlst_max_seq_len: int = 1024
    
    # Classification
    num_classes: int = 2  # benign/malignant
    focal_alpha: float = 0.25
    focal_gamma: float = 2.0


# =============================================================================
# 1. SPATIAL AND TEMPORAL CALIBRATION
# =============================================================================

class SpatialCalibrator:
    """
    N-wire phantom-based spatial calibration
    Maps image coordinates {I} to probe coordinates {P}
    """
    
    def __init__(self):
        self.T_image_to_probe = None  # 4x4 homogeneous transformation
        self.residual_error = None
        
    def detect_wire_intersections(self, image: np.ndarray) -> np.ndarray:
        """
        Detect wire intersections in N-wire phantom image
        
        Args:
            image: Ultrasound image of N-wire phantom
            
        Returns:
            intersections: Nx2 array of (u, v) pixel coordinates
        """
        # Preprocessing
        gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) if len(image.shape) == 3 else image
        blurred = cv2.GaussianBlur(gray, (5, 5), 0)
        
        # Thresholding to detect wires
        _, binary = cv2.threshold(blurred, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
        
        # Morphological operations to clean up
        kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (3, 3))
        cleaned = cv2.morphologyEx(binary, cv2.MORPH_CLOSE, kernel)
        
        # Detect line segments (wires)
        edges = cv2.Canny(cleaned, 50, 150)
        lines = cv2.HoughLinesP(edges, 1, np.pi/180, threshold=50, 
                                minLineLength=30, maxLineGap=10)
        
        # Find intersections between lines
        intersections = self._compute_intersections(lines)
        return intersections
    
    def _compute_intersections(self, lines) -> np.ndarray:
        """Compute intersection points between detected line segments"""
        if lines is None or len(lines) < 2:
            return np.array([])
        
        intersections = []
        for i in range(len(lines)):
            for j in range(i+1, len(lines)):
                x1, y1, x2, y2 = lines[i][0]
                x3, y3, x4, y4 = lines[j][0]
                
                # Line intersection calculation
                denom = (x1-x2)*(y3-y4) - (y1-y2)*(x3-x4)
                if abs(denom) > 1e-10:
                    t = ((x1-x3)*(y3-y4) - (y1-y3)*(x3-x4)) / denom
                    u = -((x1-x2)*(y1-y3) - (y1-y2)*(x1-x3)) / denom
                    
                    if 0 <= t <= 1 and 0 <= u <= 1:
                        px = x1 + t * (x2 - x1)
                        py = y1 + t * (y2 - y1)
                        intersections.append([px, py])
        
        return np.array(intersections) if intersections else np.array([])
    
    def calibrate(self, image_points: np.ndarray, 
                  probe_points: np.ndarray) -> np.ndarray:
        """
        Compute T_image_to_probe using point correspondences
        
        Args:
            image_points: Nx2 pixel coordinates
            probe_points: Nx3 physical coordinates in probe frame
            
        Returns:
            T: 4x4 transformation matrix
        """
        assert len(image_points) == len(probe_points) >= 6
        
        # Convert to homogeneous coordinates
        image_h = np.hstack([image_points, np.zeros((len(image_points), 1)), 
                            np.ones((len(image_points), 1))])
        
        # Solve for transformation using SVD
        # Minimize: ||probe_points - T @ image_points||^2
        
        # Center the points
        centroid_img = np.mean(image_points, axis=0)
        centroid_probe = np.mean(probe_points, axis=0)
        
        img_centered = image_points - centroid_img
        probe_centered = probe_points - centroid_probe
        
        # Compute optimal rotation using SVD
        H = img_centered.T @ probe_centered
        U, S, Vt = np.linalg.svd(H)
        R_mat = Vt.T @ U.T
        
        # Ensure right-handed coordinate system
        if np.linalg.det(R_mat) < 0:
            Vt[-1, :] *= -1
            R_mat = Vt.T @ U.T
        
        # Compute translation
        t = centroid_probe - R_mat @ centroid_img
        
        # Construct 4x4 transformation
        T = np.eye(4)
        T[:3, :3] = R_mat
        T[:3, 3] = t
        
        # Enforce orthogonality constraint
        T = self._enforce_orthogonality(T)
        
        self.T_image_to_probe = T
        
        # Compute residual error
        transformed = (T @ image_h.T).T
        residuals = probe_points - transformed[:, :3]
        self.residual_error = np.mean(np.linalg.norm(residuals, axis=1))
        
        return T
    
    def _enforce_orthogonality(self, T: np.ndarray) -> np.ndarray:
        """Enforce orthogonality of rotation matrix"""
        R_mat = T[:3, :3]
        U, _, Vt = np.linalg.svd(R_mat)
        R_orthogonal = U @ Vt
        T[:3, :3] = R_orthogonal
        return T
    
    def transform_point(self, pixel_coord: np.ndarray) -> np.ndarray:
        """Transform pixel coordinate to probe coordinate"""
        if self.T_image_to_probe is None:
            raise ValueError("Calibrator not calibrated yet")
        
        pixel_h = np.array([pixel_coord[0], pixel_coord[1], 0, 1])
        probe_coord = self.T_image_to_probe @ pixel_h
        return probe_coord[:3]


class TemporalCalibrator:
    """
    Temporal calibration using cross-correlation of periodic motion
    """
    
    def __init__(self):
        self.time_shift = None  # milliseconds
        self.cross_correlation = None
        
    def calibrate(self, tracker_positions: np.ndarray, 
                  image_positions: np.ndarray,
                  tracker_timestamps: np.ndarray,
                  image_timestamps: np.ndarray) -> float:
        """
        Determine time shift between tracker and image streams
        
        Args:
            tracker_positions: 1D position sequence from tracker
            image_positions: 1D position sequence from image (bottom line)
            tracker_timestamps: timestamps for tracker data
            image_timestamps: timestamps for image data
            
        Returns:
            time_shift: optimal time shift in milliseconds
        """
        # Normalize sequences
        tracker_norm = (tracker_positions - np.mean(tracker_positions)) / \
                       np.std(tracker_positions)
        image_norm = (image_positions - np.mean(image_positions)) / \
                     np.std(image_positions)
        
        # Compute cross-correlation
        correlation = np.correlate(tracker_norm, image_norm, mode='full')
        lags = np.arange(-len(image_norm) + 1, len(tracker_norm))
        
        # Find peak
        peak_idx = np.argmax(np.abs(correlation))
        lag_samples = lags[peak_idx]
        
        # Convert to time
        dt_tracker = np.mean(np.diff(tracker_timestamps))
        time_shift = lag_samples * dt_tracker * 1000  # ms
        
        self.time_shift = time_shift
        self.cross_correlation = correlation[peak_idx] / \
                                (len(tracker_norm) * np.std(tracker_norm) * 
                                 np.std(image_norm))
        
        return time_shift
    
    def synchronize(self, tracker_data: List[Dict], 
                    image_timestamps: np.ndarray) -> List[Dict]:
        """
        Synchronize tracker data to image timestamps
        
        Args:
            tracker_data: List of {'timestamp': t, 'pose': T} dicts
            image_timestamps: target timestamps
            
        Returns:
            synchronized_poses: List of poses at image timestamps
        """
        # Apply time shift
        tracker_timestamps = np.array([d['timestamp'] for d in tracker_data])
        tracker_timestamps_shifted = tracker_timestamps - self.time_shift / 1000
        
        # Interpolate poses at image timestamps
        synchronized_poses = []
        for img_ts in image_timestamps:
            # Find surrounding tracker frames
            idx = np.searchsorted(tracker_timestamps_shifted, img_ts)
            
            if idx == 0:
                pose = tracker_data[0]['pose']
            elif idx >= len(tracker_data):
                pose = tracker_data[-1]['pose']
            else:
                # Linear interpolation
                t0, t1 = tracker_timestamps_shifted[idx-1], \
                         tracker_timestamps_shifted[idx]
                alpha = (img_ts - t0) / (t1 - t0)
                pose0, pose1 = tracker_data[idx-1]['pose'], \
                               tracker_data[idx]['pose']
                pose = self._interpolate_pose(pose0, pose1, alpha)
            
            synchronized_poses.append(pose)
        
        return synchronized_poses
    
    def _interpolate_pose(self, pose0: np.ndarray, pose1: np.ndarray, 
                          alpha: float) -> np.ndarray:
        """Interpolate between two poses using SLERP for rotation"""
        # Decompose poses
        R0, t0 = pose0[:3, :3], pose0[:3, 3]
        R1, t1 = pose1[:3, :3], pose1[:3, 3]
        
        # Linear interpolation for translation
        t = (1 - alpha) * t0 + alpha * t1
        
        # SLERP for rotation
        rot0 = R.from_matrix(R0)
        rot1 = R.from_matrix(R1)
        rot_interp = R.from_quat(
            (1 - alpha) * rot0.as_quat() + alpha * rot1.as_quat()
        )
        R_interp = rot_interp.as_matrix()
        
        # Reconstruct pose
        pose = np.eye(4)
        pose[:3, :3] = R_interp
        pose[:3, 3] = t
        
        return pose


# =============================================================================
# 2. BUS-SAM-2: BREAST ULTRASOUND SEGMENTATION
# =============================================================================

class PromptEncoder(nn.Module):
    """Encode point and box prompts for SAM-2"""
    
    def __init__(self, embed_dim: int = 256):
        super().__init__()
        self.embed_dim = embed_dim
        
        # Point encoder
        self.point_embed = nn.Sequential(
            nn.Linear(2, embed_dim // 2),
            nn.ReLU(),
            nn.Linear(embed_dim // 2, embed_dim)
        )
        
        # Label encoder (positive/negative point)
        self.label_embed = nn.Embedding(2, embed_dim)
        
    def forward(self, points: torch.Tensor, 
                labels: torch.Tensor) -> torch.Tensor:
        """
        Args:
            points: (B, N, 2) point coordinates (x, y)
            labels: (B, N) point labels (1=positive, 0=negative)
        Returns:
            prompt_embeddings: (B, N, embed_dim)
        """
        point_embeds = self.point_embed(points)
        label_embeds = self.label_embed(labels.long())
        return point_embeds + label_embeds


class MemoryAttention(nn.Module):
    """Streaming attention for temporal memory in SAM-2"""
    
    def __init__(self, dim: int = 256, num_heads: int = 8):
        super().__init__()
        self.num_heads = num_heads
        self.scale = (dim // num_heads) ** -0.5
        
        self.q_proj = nn.Linear(dim, dim)
        self.k_proj = nn.Linear(dim, dim)
        self.v_proj = nn.Linear(dim, dim)
        self.out_proj = nn.Linear(dim, dim)
        
    def forward(self, query: torch.Tensor, 
                memory: torch.Tensor,
                memory_mask: Optional[torch.Tensor] = None):
        B, N, C = query.shape
        _, M, _ = memory.shape
        
        Q = self.q_proj(query).view(B, N, self.num_heads, C // self.num_heads)
        K = self.k_proj(memory).view(B, M, self.num_heads, C // self.num_heads)
        V = self.v_proj(memory).view(B, M, self.num_heads, C // self.num_heads)
        
        # Attention scores
        attn = torch.einsum('bnhd,bmhd->bhnm', Q, K) * self.scale
        
        if memory_mask is not None:
            attn = attn.masked_fill(memory_mask.unsqueeze(1).unsqueeze(1), 
                                   float('-inf'))
        
        attn = F.softmax(attn, dim=-1)
        out = torch.einsum('bhnm,bmhd->bnhd', attn, V)
        out = out.reshape(B, N, C)
        
        return self.out_proj(out)


class BUSSAM2(nn.Module):
    """
    BUS-SAM-2: Fine-tuned Segment Anything Model 2 for Breast Ultrasound
    
    Simplified implementation based on SAM-2 architecture with memory bank
    for video object segmentation.
    """
    
    def __init__(self, config: SystemConfig):
        super().__init__()
        self.config = config
        self.embed_dim = 256
        
        # Image encoder (Vision Transformer based)
        self.image_encoder = self._build_image_encoder()
        
        # Prompt encoder
        self.prompt_encoder = PromptEncoder(self.embed_dim)
        
        # Memory encoder
        self.memory_encoder = nn.Sequential(
            nn.Conv2d(1, 64, 3, padding=1),
            nn.ReLU(),
            nn.Conv2d(64, 128, 3, padding=1),
            nn.ReLU(),
            nn.Conv2d(128, self.embed_dim, 3, padding=1)
        )
        
        # Memory attention
        self.memory_attention = MemoryAttention(self.embed_dim)
        
        # Mask decoder (lightweight transformer)
        self.mask_decoder = self._build_mask_decoder()
        
        # Memory bank
        self.memory_bank = deque(maxlen=16)  # Store recent frames
        
    def _build_image_encoder(self):
        """Build hierarchical image encoder"""
        # Simplified: In practice, use SAM-2's actual image encoder
        return nn.Sequential(
            nn.Conv2d(1, 64, 7, stride=2, padding=3),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.MaxPool2d(3, stride=2, padding=1),
            # ResNet-like blocks would follow
            nn.Conv2d(64, self.embed_dim, 1)
        )
    
    def _build_mask_decoder(self):
        """Build transformer-based mask decoder"""
        decoder_layer = nn.TransformerDecoderLayer(
            d_model=self.embed_dim,
            nhead=8,
            dim_feedforward=2048,
            batch_first=True
        )
        return nn.TransformerDecoder(decoder_layer, num_layers=2)
    
    def encode_image(self, image: torch.Tensor) -> torch.Tensor:
        """Encode image to feature map"""
        # image: (B, 1, H, W)
        features = self.image_encoder(image)  # (B, embed_dim, H', W')
        return features
    
    def encode_memory(self, mask: torch.Tensor, 
                      image_features: torch.Tensor) -> torch.Tensor:
        """Encode previous mask and features to memory"""
        # mask: (B, 1, H, W)
        memory = self.memory_encoder(mask)
        # Combine with image features
        combined = memory + F.interpolate(
            image_features, size=memory.shape[2:], mode='bilinear'
        )
        # Global average pooling to tokens
        B, C, H, W = combined.shape
        return combined.view(B, C, H * W).permute(0, 2, 1)  # (B, HW, C)
    
    def forward(self, image: torch.Tensor,
                points: torch.Tensor,
                point_labels: torch.Tensor,
                memory: Optional[torch.Tensor] = None) -> Dict[str, torch.Tensor]:
        """
        Forward pass for single frame
        
        Args:
            image: (B, 1, H, W) ultrasound image
            points: (B, N, 2) prompt points
            point_labels: (B, N) point labels (1=positive, 0=negative)
            memory: (B, M, C) optional memory from previous frames
            
        Returns:
            Dict with 'mask', 'memory', 'iou_pred'
        """
        B = image.shape[0]
        
        # Encode image
        image_features = self.encode_image(image)  # (B, C, H', W')
        H_f, W_f = image_features.shape[2:]
        
        # Encode prompts
        prompt_embeds = self.prompt_encoder(points, point_labels)  # (B, N, C)
        
        # Prepare query tokens
        # Combine prompt embeddings with positional encodings
        queries = prompt_embeds
        
        # Memory attention if memory exists
        if memory is not None and len(memory) > 0:
            memory_tensor = torch.cat(list(memory), dim=1) if \
                isinstance(memory, deque) else memory
            queries = self.memory_attention(queries, memory_tensor)
        
        # Decode mask
        # Flatten spatial features
        spatial_tokens = image_features.flatten(2).permute(0, 2, 1)  # (B, H'W', C)
        
        # Transformer decoding
        decoded = self.mask_decoder(queries, spatial_tokens)
        
        # Generate mask from decoded tokens
        # Use first token (aggregated) for mask prediction
        mask_token = decoded[:, 0, :]  # (B, C)
        
        # Upsample to original resolution
        mask_pred = torch.einsum('bc,bchw->bhw', mask_token, 
                                  image_features).unsqueeze(1)
        mask_pred = F.interpolate(mask_pred, size=image.shape[2:], 
                                   mode='bilinear', align_corners=False)
        
        # IoU prediction head
        iou_pred = torch.sigmoid(
            nn.Linear(self.embed_dim, 1)(mask_token)
        )
        
        # Update memory
        new_memory = self.encode_memory(
            (mask_pred > 0).float(), image_features
        )
        
        return {
            'mask': torch.sigmoid(mask_pred),
            'memory': new_memory,
            'iou_pred': iou_pred
        }
    
    def segment_sequence(self, 
                         images: torch.Tensor,
                         initial_points: torch.Tensor,
                         initial_labels: torch.Tensor,
                         iou_threshold: float = 0.1,
                         consecutive_frames: int = 3) -> torch.Tensor:
        """
        Segment entire video sequence with forward-backward propagation
        
        Args:
            images: (T, 1, H, W) image sequence
            initial_points: (N, 2) initial prompt points
            initial_labels: (N,) initial point labels
            
        Returns:
            masks: (T, 1, H, W) segmentation masks
        """
        T = images.shape[0]
        masks = []
        self.memory_bank.clear()
        
        # Forward propagation
        forward_masks = []
        prev_mask = None
        
        for t in range(T):
            # Use initial prompts for first frame, previous mask center for others
            if t == 0:
                points = initial_points.unsqueeze(0)
                labels = initial_labels.unsqueeze(0)
            else:
                # Find center of previous mask for propagation
                if prev_mask is not None:
                    center = self._find_mask_center(prev_mask[0, 0])
                    points = center.unsqueeze(0).unsqueeze(0)  # (1, 1, 2)
                    labels = torch.ones(1, 1, device=images.device)
                else:
                    points = initial_points.unsqueeze(0)
                    labels = initial_labels.unsqueeze(0)
            
            result = self.forward(
                images[t:t+1], points, labels,
                memory=list(self.memory_bank) if self.memory_bank else None
            )
            
            mask = result['mask']
            forward_masks.append(mask)
            
            # Update memory
            self.memory_bank.append(result['memory'])
            prev_mask = mask
            
            # Check termination condition
            if t > 0 and self._compute_dice(forward_masks[-1], 
                                            forward_masks[-2]) < iou_threshold:
                consecutive_low += 1
                if consecutive_low >= consecutive_frames:
                    break
            else:
                consecutive_low = 0
        
        # Backward propagation from termination point
        # (Similar logic, omitted for brevity)
        
        return torch.cat(forward_masks, dim=0)
    
    def _find_mask_center(self, mask: torch.Tensor) -> torch.Tensor:
        """Find centroid of binary mask"""
        coords = torch.nonzero(mask > 0.5, as_tuple=False)
        if len(coords) == 0:
            return torch.zeros(2, device=mask.device)
        return coords.float().mean(dim=0)[[1, 0]]  # (y, x) -> (x, y)
    
    def _compute_dice(self, mask1: torch.Tensor, 
                      mask2: torch.Tensor, 
                      threshold: float = 0.5) -> float:
        """Compute Dice similarity coefficient"""
        m1 = (mask1 > threshold).float()
        m2 = (mask2 > threshold).float()
        intersection = (m1 * m2).sum()
        return (2.0 * intersection / (m1.sum() + m2.sum() + 1e-8)).item()


# =============================================================================
# 3. HYBRID LESION-INFORMED SPATIOTEMPORAL TRANSFORMER (HLST)
# =============================================================================

class DualStreamSpatialEncoder(nn.Module):
    """
    CNN-Transformer spatial encoder with intra/peri-lesional feature aggregation
    """
    
    def __init__(self, config: SystemConfig):
        super().__init__()
        self.config = config
        
        # CNN backbone for local features
        self.cnn_backbone = nn.Sequential(
            # Block 1
            nn.Conv2d(1, 64, 7, stride=2, padding=3),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(3, stride=2, padding=1),
            
            # Block 2-4 (ResNet-style)
            self._make_layer(64, 64, 2),
            self._make_layer(64, 128, 2, stride=2),
            self._make_layer(128, 256, 2, stride=2),
            self._make_layer(256, 512, 2, stride=2),
        )
        
        # Transformer for global context
        encoder_layer = nn.TransformerEncoderLayer(
            d_model=512,
            nhead=8,
            dim_feedforward=2048,
            dropout=0.1,
            batch_first=True
        )
        self.transformer = nn.TransformerEncoder(encoder_layer, num_layers=4)
        
        # Lesion-guided attention
        self.lesion_attention = nn.Sequential(
            nn.Conv2d(512, 128, 1),
            nn.ReLU(),
            nn.Conv2d(128, 1, 1),
            nn.Sigmoid()
        )
        
        # Projection to embedding space
        self.embed_proj = nn.Linear(512, config.hlst_hidden_dim)
        
    def _make_layer(self, in_ch: int, out_ch: int, 
                    num_blocks: int, stride: int = 1):
        """Create ResNet-style layer"""
        layers = []
        layers.append(nn.Conv2d(in_ch, out_ch, 3, stride=stride, padding=1))
        layers.append(nn.BatchNorm2d(out_ch))
        layers.append(nn.ReLU(inplace=True))
        
        for _ in range(num_blocks - 1):
            layers.append(nn.Conv2d(out_ch, out_ch, 3, padding=1))
            layers.append(nn.BatchNorm2d(out_ch))
            layers.append(nn.ReLU(inplace=True))
        
        return nn.Sequential(*layers)
    
    def forward(self, image: torch.Tensor, 
                lesion_mask: torch.Tensor) -> torch.Tensor:
        """
        Args:
            image: (B, 1, H, W) ultrasound image
            lesion_mask: (B, 1, H, W) lesion segmentation mask
            
        Returns:
            embedding: (B, hidden_dim) frame-level feature embedding
        """
        # Extract CNN features
        cnn_features = self.cnn_backbone(image)  # (B, 512, H', W')
        
        # Lesion-guided spatial attention
        attention_map = self.lesion_attention(cnn_features)  # (B, 1, H', W')
        
        # Weighted features: combine intra-lesional and peri-lesional
        weighted_features = cnn_features * attention_map
        
        # Global average pooling with lesion focus
        intra_lesion = (weighted_features * F.interpolate(
            lesion_mask, size=cnn_features.shape[2:], mode='nearest'
        )).sum(dim=[2, 3]) / (lesion_mask.sum(dim=[2, 3]) + 1e-8)
        
        peri_lesion = weighted_features.mean(dim=[2, 3])
        
        # Combine features
        combined = intra_lesion + peri_lesion  # (B, 512)
        
        # Transformer processing for global context
        # Reshape to sequence (add dummy sequence dimension)
        seq = combined.unsqueeze(1)  # (B, 1, 512)
        transformed = self.transformer(seq)  # (B, 1, 512)
        
        # Project to embedding space
        embedding = self.embed_proj(transformed.squeeze(1))  # (B, hidden_dim)
        
        return embedding


class LongformerAttention(nn.Module):
    """
    Sparse-dense attention from Longformer
    Combines local windowed attention with global attention on CLS token
    """
    
    def __init__(self, dim: int, num_heads: int, window_size: int):
        super().__init__()
        self.num_heads = num_heads
        self.window_size = window_size
        self.scale = (dim // num_heads) ** -0.5
        
        self.qkv = nn.Linear(dim, dim * 3)
        self.proj = nn.Linear(dim, dim)
        
        # Global attention for CLS token
        self.global_q = nn.Linear(dim, dim)
        
    def forward(self, x: torch.Tensor, 
                is_cls_token: torch.Tensor) -> torch.Tensor:
        """
        Args:
            x: (B, N, C) input sequence
            is_cls_token: (B, N) bool indicating CLS token positions
            
        Returns:
            attended: (B, N, C)
        """
        B, N, C = x.shape
        
        # Standard local windowed attention
        qkv = self.qkv(x).reshape(B, N, 3, self.num_heads, C // self.num_heads)
        qkv = qkv.permute(2, 0, 3, 1, 4)  # (3, B, H, N, D)
        q, k, v = qkv[0], qkv[1], qkv[2]
        
        # Create local attention mask (sliding window)
        attn = torch.zeros(B, self.num_heads, N, N, device=x.device)
        
        for i in range(N):
            window_start = max(0, i - self.window_size // 2)
            window_end = min(N, i + self.window_size // 2 + 1)
            attn[:, :, i, window_start:window_end] = \
                torch.einsum('bhd,bmd->bhm', q[:, :, i], k[:, :, window_start:window_end])
        
        # Global attention for CLS token (attends to all, all attend to it)
        global_q = self.global_q(x) * is_cls_token.unsqueeze(-1)
        global_q = global_q.reshape(B, N, self.num_heads, C // self.num_heads)
        global_q = global_q.permute(0, 2, 1, 3)  # (B, H, N, D)
        
        global_attn = torch.einsum('bhid,bhjd->bhij', global_q, k)
        attn = attn + global_attn
        
        attn = attn * self.scale
        attn = F.softmax(attn, dim=-1)
        
        out = torch.einsum('bhnm,bhmd->bhnd', attn, v)
        out = out.transpose(1, 2).reshape(B, N, C)
        
        return self.proj(out)


class HLST(nn.Module):
    """
    Hybrid Lesion-informed Spatiotemporal Transformer
    
    Sequence-level malignancy classification for breast ultrasound videos
    """
    
    def __init__(self, config: SystemConfig):
        super().__init__()
        self.config = config
        
        # Spatial encoder
        self.spatial_encoder = DualStreamSpatialEncoder(config)
        
        # CLS token
        self.cls_token = nn.Parameter(torch.randn(1, 1, config.hlst_hidden_dim))
        
        # Temporal Longformer encoder
        self.temporal_layers = nn.ModuleList([
            LongformerAttention(
                config.hlst_hidden_dim, 
                config.hlst_num_heads,
                config.hlst_window_size
            ) for _ in range(config.hlst_num_layers)
        ])
        
        self.temporal_norms = nn.ModuleList([
            nn.LayerNorm(config.hlst_hidden_dim) 
            for _ in range(config.hlst_num_layers)
        ])
        
        # Classification head
        self.classifier = nn.Sequential(
            nn.Linear(config.hlst_hidden_dim, config.hlst_hidden_dim // 2),
            nn.ReLU(),
            nn.Dropout(0.3),
            nn.Linear(config.hlst_hidden_dim // 2, config.num_classes)
        )
        
    def forward(self, images: torch.Tensor, 
                lesion_masks: torch.Tensor) -> torch.Tensor:
        """
        Args:
            images: (B, T, 1, H, W) video sequence
            lesion_masks: (B, T, 1, H, W) lesion segmentation masks
            
        Returns:
            logits: (B, num_classes) classification logits
        """
        B, T, C, H, W = images.shape
        
        # Reshape for spatial encoding
        images_flat = images.view(B * T, C, H, W)
        masks_flat = lesion_masks.view(B * T, 1, H, W)
        
        # Spatial encoding for all frames
        spatial_embeds = self.spatial_encoder(images_flat, masks_flat)
        spatial_embeds = spatial_embeds.view(B, T, -1)  # (B, T, hidden_dim)
        
        # Prepend CLS token
        cls_tokens = self.cls_token.expand(B, -1, -1)
        sequence = torch.cat([cls_tokens, spatial_embeds], dim=1)  # (B, T+1, C)
        
        # Create CLS mask for global attention
        cls_mask = torch.zeros(B, T + 1, dtype=torch.bool, device=images.device)
        cls_mask[:, 0] = True
        
        # Temporal Longformer processing
        x = sequence
        for layer, norm in zip(self.temporal_layers, self.temporal_norms):
            x = x + layer(norm(x), cls_mask)
        
        # Extract CLS embedding
        cls_embed = x[:, 0]  # (B, hidden_dim)
        
        # Classification
        logits = self.classifier(cls_embed)
        
        return logits
    
    def compute_loss(self, logits: torch.Tensor, 
                     labels: torch.Tensor) -> torch.Tensor:
        """
        Focal loss for class-balanced training
        
        Args:
            logits: (B, num_classes)
            labels: (B,) ground truth labels
        """
        probs = F.softmax(logits, dim=-1)
        targets = F.one_hot(labels, num_classes=self.config.num_classes).float()
        
        # Focal loss formula
        ce_loss = -targets * torch.log(probs + 1e-8)
        pt = torch.where(targets == 1, probs, 1 - probs)
        focal_weight = (1 - pt) ** self.config.focal_gamma
        
        alpha_t = torch.where(
            targets == 1, 
            self.config.focal_alpha, 
            1 - self.config.focal_alpha
        )
        
        loss = (alpha_t * focal_weight * ce_loss).sum(dim=1).mean()
        return loss


# =============================================================================
# 4. 3D RECONSTRUCTION
# =============================================================================

class VolumeReconstructor:
    """
    3D volume reconstruction from tracked freehand ultrasound
    Includes bin-filling and gradient-aware hole-filling
    """
    
    def __init__(self, config: SystemConfig):
        self.config = config
        self.voxel_spacing = np.array(config.voxel_spacing)
        
    def bin_filling(self, 
                    images: List[np.ndarray],
                    poses: List[np.ndarray],
                    masks: Optional[List[np.ndarray]] = None) -> Dict:
        """
        Map 2D ultrasound frames to 3D volume using tracked poses
        
        Args:
            images: List of 2D ultrasound images
            poses: List of 4x4 transformation matrices (image to world)
            masks: Optional segmentation masks
            
        Returns:
            Dict with intensity volume, nipple volume, lesion volume
        """
        # Determine volume bounds
        all_points = []
        for pose in poses:
            corners = self._get_image_corners(pose)
            all_points.extend(corners)
        all_points = np.array(all_points)
        
        bounds_min = all_points.min(axis=0)
        bounds_max = all_points.max(axis=0)
        
        # Create voxel grid
        grid_shape = np.ceil((bounds_max - bounds_min) / 
                            self.voxel_spacing).astype(int) + 1
        origin = bounds_min
        
        # Accumulation arrays
        intensity_acc = np.zeros(grid_shape)
        weight_acc = np.zeros(grid_shape)
        nipple_acc = np.zeros(grid_shape) if masks else None
        lesion_acc = np.zeros(grid_shape) if masks else None
        
        # Bin filling
        for img, pose, mask in zip(images, poses, masks or [None]*len(images)):
            h, w = img.shape
            
            for v in range(h):
                for u in range(w):
                    # Map pixel to 3D world coordinate
                    pixel_phys = np.array([u * self.config.pixel_spacing,
                                          v * self.config.pixel_spacing, 0, 1])
                    world_coord = pose @ pixel_phys
                    world_coord = world_coord[:3]
                    
                    # Find voxel index
                    voxel_idx = np.floor((world_coord - origin) / 
                                        self.voxel_spacing).astype(int)
                    
                    if np.all(voxel_idx >= 0) and np.all(voxel_idx < grid_shape):
                        intensity_acc[tuple(voxel_idx)] += img[v, u]
                        weight_acc[tuple(voxel_idx)] += 1
                        
                        if mask is not None:
                            if mask[v, u] == 1:  # Nipple
                                nipple_acc[tuple(voxel_idx)] += 1
                            elif mask[v, u] == 2:  # Lesion
                                lesion_acc[tuple(voxel_idx)] += 1
        
        # Normalize intensity
        intensity_vol = np.where(weight_acc > 0, 
                                  intensity_acc / weight_acc, 0)
        
        # Binarize segmentation volumes
        nipple_vol = (nipple_acc > 0).astype(np.float32) if masks else None
        lesion_vol = (lesion_acc > 0).astype(np.float32) if masks else None
        
        return {
            'intensity': intensity_vol,
            'nipple': nipple_vol,
            'lesion': lesion_vol,
            'weights': weight_acc,
            'origin': origin,
            'shape': grid_shape
        }
    
    def _get_image_corners(self, pose: np.ndarray) -> np.ndarray:
        """Get 3D coordinates of image corners"""
        h, w = self.config.image_height, self.config.image_width
        
        corners = np.array([
            [0, 0, 0, 1],
            [w * self.config.pixel_spacing, 0, 0, 1],
            [0, h * self.config.pixel_spacing, 0, 1],
            [w * self.config.pixel_spacing, h * self.config.pixel_spacing, 0, 1]
        ])
        
        world_corners = (pose @ corners.T).T
        return world_corners[:, :3]
    
    def gradient_aware_hole_filling(self, 
                                     volume: np.ndarray,
                                     weights: np.ndarray,
                                     iterations: int = 50) -> np.ndarray:
        """
        Gradient-aware iterative inpainting for hole filling
        
        Based on Wen et al. 2013, modified for ultrasound reconstruction
        """
        filled = volume.copy()
        hole_mask = (weights == 0)
        
        # Define hole domain and boundary
        structure = np.ones((3, 3, 3), dtype=bool)
        dilated = ndimage.binary_dilation(weights > 0, structure, iterations=2)
        eroded = ndimage.binary_erosion(weights > 0, structure, iterations=1)
        
        domain = dilated & ~eroded  # Hole region with boundary
        known = weights > 0
        
        # Narrow band initialization
        boundary = domain & ndimage.binary_dilation(known, structure)
        
        for iter in range(iterations):
            new_filled = filled.copy()
            new_boundary = np.zeros_like(boundary)
            
            for idx in np.argwhere(boundary):
                idx = tuple(idx)
                neighborhood = self._get_neighborhood(idx, filled.shape)
                
                # Compute weights based on gradient alignment
                weights_local = []
                values = []
                
                for nb in neighborhood:
                    if known[nb] or (not hole_mask[nb] and iter > 0):
                        # Compute gradient direction
                        grad = self._compute_gradient(filled, nb)
                        direction = np.array(idx) - np.array(nb)
                        direction = direction / (np.linalg.norm(direction) + 1e-8)
                        
                        # Anisotropic weight
                        w = 1.0 / (1 + np.linalg.norm(direction)**2)
                        w *= (1 + np.dot(grad, direction))
                        
                        weights_local.append(max(w, 0))
                        values.append(filled[nb])
                
                if weights_local:
                    new_filled[idx] = np.average(values, weights=weights_local)
                    new_boundary[idx] = True
            
            filled = new_filled
            boundary = new_boundary & hole_mask
            
            if not boundary.any():
                break
        
        return filled
    
    def _get_neighborhood(self, idx: Tuple, shape: Tuple, radius: int = 1) -> List:
        """Get 26-connected neighborhood"""
        neighborhood = []
        for dz in range(-radius, radius+1):
            for dy in range(-radius, radius+1):
                for dx in range(-radius, radius+1):
                    if dz == dy == dx == 0:
                        continue
                    nb = (idx[0]+dz, idx[1]+dy, idx[2]+dx)
                    if all(0 <= nb[i] < shape[i] for i in range(3)):
                        neighborhood.append(nb)
        return neighborhood
    
    def _compute_gradient(self, volume: np.ndarray, 
                          idx: Tuple, 
                          step: int = 1) -> np.ndarray:
        """Compute normalized gradient at voxel"""
        grad = np.zeros(3)
        for i in range(3):
            idx_plus = list(idx)
            idx_minus = list(idx)
            idx_plus[i] = min(idx[i] + step, volume.shape[i] - 1)
            idx_minus[i] = max(idx[i] - step, 0)
            
            grad[i] = volume[tuple(idx_plus)] - volume[tuple(idx_minus)]
        
        norm = np.linalg.norm(grad)
        return grad / (norm + 1e-8) if norm > 0 else grad
    
    def post_process_masks(self, volume: np.ndarray) -> np.ndarray:
        """
        Morphological opening/closing for segmentation volumes
        """
        from scipy import ndimage
        
        # 3D morphological operations
        structure = np.ones((3, 3, 3))
        
        # Opening: remove noise
        opened = ndimage.binary_opening(volume > 0, structure)
        # Closing: fill holes
        closed = ndimage.binary_closing(opened, structure)
        
        return closed.astype(np.float32)


# =============================================================================
# 5. NIPPLE-CENTRIC LOCALIZATION
# =============================================================================

class NippleCentricLocalizer:
    """
    Automated nipple-centric lesion localization with geometry-adaptive 
    clock projection
    """
    
    def __init__(self, config: SystemConfig):
        self.config = config
        
    def compute_craniocaudal_direction(self, 
                                        poses: List[np.ndarray],
                                        initial_ratio: float = 0.2) -> np.ndarray:
        """
        Compute craniocaudal direction from initial scanning phase
        
        Args:
            poses: List of 4x4 transformation matrices
            initial_ratio: ratio of initial frames to use
            
        Returns:
            direction: 3D unit vector pointing cranial to caudal
        """
        n_initial = int(len(poses) * initial_ratio)
        
        # Extract translation components
        positions = np.array([p[:3, 3] for p in poses])
        
        # First half vs second half of initial phase
        first_half = positions[:n_initial//2].mean(axis=0)
        second_half = positions[n_initial//2:n_initial].mean(axis=0)
        
        direction = second_half - first_half
        direction = direction / np.linalg.norm(direction)
        
        return direction
    
    def fit_ellipsoid(self, 
                      surface_points: np.ndarray,
                      lesion_points: List[np.ndarray]) -> Dict:
        """
        Fit ellipsoid to breast surface with lesion constraint
        
        Args:
            surface_points: Nx3 array of surface voxel coordinates
            lesion_points: List of Mx3 arrays for each lesion
            
        Returns:
            ellipsoid parameters: center, semi-axes, rotation matrix
        """
        # Downsample surface points
        if len(surface_points) > 5000:
            indices = np.random.choice(len(surface_points), 5000, replace=False)
            surface_points = surface_points[indices]
        
        # Initial guess: PCA-based ellipsoid
        center_init = surface_points.mean(axis=0)
        cov = np.cov(surface_points.T)
        eigenvalues, eigenvectors = np.linalg.eigh(cov)
        
        # Sort descending
        order = eigenvalues.argsort()[::-1]
        eigenvalues = eigenvalues[order]
        eigenvectors = eigenvectors[:, order]
        
        # Initial semi-axes (2*sqrt(eigenvalue) for 95% confidence)
        a_init = 2 * np.sqrt(eigenvalues[0])
        b_init = 2 * np.sqrt(eigenvalues[1])
        c_init = 2 * np.sqrt(eigenvalues[2])
        
        # Rotation angles from eigenvectors
        R_init = eigenvectors
        
        # Optimization
        def objective(params):
            center = params[:3]
            a, b, c = sorted(params[3:6], reverse=True)  # enforce a>=b>=c
            alpha, beta, gamma = params[6:9]
            
            # Construct rotation matrix
            R_mat = R.from_euler('zyx', [alpha, beta, gamma]).as_matrix()
            
            # Ellipsoid error for surface points
            D = np.diag([1/a**2, 1/b**2, 1/c**2])
            transformed = (surface_points - center) @ R_mat
            errors = np.sum(transformed * (transformed @ D), axis=1) - 1
            
            # Constraint: lesions must be inside ellipsoid
            constraint_penalty = 0
            for lesion in lesion_points:
                transformed_lesion = (lesion - center) @ R_mat
                dist = np.sum(transformed_lesion * (transformed_lesion @ D), 
                             axis=1)
                outside = dist > 1
                constraint_penalty += np.sum((dist[outside] - 1)**2)
            
            return np.sum(errors**2) + 1e6 * constraint_penalty
        
        x0 = np.concatenate([
            center_init,
            [a_init, b_init, c_init],
            R.from_matrix(R_init).as_euler('zyx')
        ])
        
        bounds = [
            (None, None), (None, None), (None, None),  # center
            (0, None), (0, None), (0, None),  # semi-axes
            (-np.pi, np.pi), (-np.pi, np.pi), (-np.pi, np.pi)  # angles
        ]
        
        result = minimize(objective, x0, method='L-BFGS-B', bounds=bounds)
        
        opt_params = result.x
        center = opt_params[:3]
        a, b, c = sorted(opt_params[3:6], reverse=True)
        R_mat = R.from_euler('zyx', opt_params[6:9]).as_matrix()
        
        return {
            'center': center,
            'semi_axes': (a, b, c),
            'rotation': R_mat,
            'normal': R_mat[:, 2]  # Minor axis (c) direction
        }
    
    def compute_clock_projection(self,
                                  nipple_center: np.ndarray,
                                  lesion_centers: List[np.ndarray],
                                  ellipsoid: Dict,
                                  craniocaudal_dir

Related posts, You May like to read