In the fast-evolving world of medical diagnostics, early and accurate detection of bacterial infections can mean the difference between life and death. Yet, traditional methods remain slow, invasive, and often inaccurate. Now, a groundbreaking new AI-powered solution β EmiNet β is changing the game. Developed by researchers at the University of Edinburgh, EmiNet leverages synthetic data and advanced deep learning to detect moving bacteria in real-time optical endomicroscopy (OEM) images with unprecedented accuracy.
This isnβt just another incremental update β itβs a paradigm shift in how we diagnose infections in the lungs and beyond. In this article, weβll explore 7 revolutionary breakthroughs behind EmiNet, reveal how it outperforms outdated detection methods, and explain why this technology could transform modern medicine.
1. The Problem: Why Old Bacteria Detection Methods Fail
Before diving into EmiNetβs innovations, itβs crucial to understand why existing approaches fall short.
β Limitations of Traditional Techniques:
- Slow turnaround times: Culturing bacteria takes 24β72 hours.
- Low spatial resolution: Imaging techniques often miss small or moving pathogens.
- Subjective annotations: Human-labeled medical images lack consistency.
- Data scarcity: Supervised AI models need large annotated datasets β which are expensive, time-consuming, and rare in medicine.
Optical endomicroscopy (OEM) offers real-time, in vivo imaging, but interpreting these complex sequences manually is nearly impossible. Enter the need for automated, AI-driven detection.
“Annotating OEM images is not only expensive and time-consuming, but demands a profound level of medical expertise.” β Demirel et al., 2025
2. The Solution: EmiNet β A Next-Gen AI for Bacteria Detection
EmiNet is a novel deep learning architecture designed specifically to detect moving bacteria in 3D OEM image sequences. Unlike conventional models, EmiNet doesnβt rely on scarce real-world annotated data. Instead, itβs trained entirely on synthetic data β a game-changer for scalability and performance.
β Key Features of EmiNet:
- Dual-stream architecture: Appearance + Motion analysis
- Hybrid CNN-Transformer encoder for local and global feature extraction
- Proprietary MMCCA module for cross-modal attention
- Trained on fully controlled synthetic bacteria motion patterns
- Tested on real ex vivo human lung tissue
This combination allows EmiNet to detect subtle bacterial movements invisible to the human eye β and even to older AI models.
3. Breakthrough #1: Synthetic Data That Mimics Reality
One of EmiNetβs most powerful innovations is its synthetic training dataset. Since real annotated OEM images with bacteria are extremely limited, the team generated photorealistic synthetic sequences with known ground truth.
How Synthetic Bacteria Are Simulated:
The model simulates bacterial motion using a random walk algorithm with controlled parameters:
\begin{align*} \textbf{1. Initial Position:} & \quad (x_0, y_0) \\ \textbf{2. At each time step } t: & \\ \theta_t & \sim U(0, 360^\circ) \quad \text{(random direction)} \\ D_t &= \text{step length (sampled)} \\ \Delta x &= D_t \cdot \cos(\theta_t), \quad \Delta y = D_t \cdot \sin(\theta_t) \\ (x_{t+1}, y_{t+1}) &= (x_t + \Delta x, \; y_t + \Delta y) \\ \textbf{3. PSF (Point Spread Function)} & \quad \text{generated at each new position} \end{align*}This mimics real bacterial motility patterns (e.g., E. coli flagellar movement) while ensuring perfect pixel-level annotations.
π‘ Why this matters: Synthetic data eliminates annotation bias and enables unlimited, diverse training samples β a huge advantage over real data.
4. Breakthrough #2: Dual-Stream Architecture for Motion + Appearance
EmiNet uses two parallel streams to analyze both the visual appearance of bacteria and their dynamic motion patterns.
STREAM | PURPOSE | METHOD |
---|---|---|
Appearance Stream | Captures shape, texture, intensity | 3D CNN blocks |
Motion Stream | Detects movement trajectories | Optical flow + temporal modeling |
The motion stream is especially critical β because bacteria move, and static segmentation models often miss them.
π EmiNet doesnβt just “see” bacteria β it watches them move.
This dual approach allows the model to distinguish true bacteria from static artifacts or noise β a common pitfall in unsupervised methods.
5. Breakthrough #3: CNN-Transformer Hybrid for Global Context
Most medical segmentation models (like 3D UNet) rely solely on convolutional neural networks (CNNs). While effective for local patterns, CNNs struggle with long-range dependencies β crucial when tracking bacteria across frames.
EmiNet solves this by integrating Transformer blocks into the encoder:
\[ z_{l} = \text{Transformer}\big(\text{LayerNorm}(F_{s})\big) \] $$ \text{Where } F_s \in \mathbb{R}^{C \times 2^{\,s-1}H \times 2^{\,s-1}W \times T} \text{ is the downsampled feature map.} $$Transformers provide a global receptive field, enabling the model to:
- Track bacteria across distant frames
- Maintain segmentation consistency
- Capture complex spatiotemporal relationships
This hybrid design outperforms pure CNN or pure Transformer models β combining the best of both worlds.
6. Breakthrough #4: MMCCA β Multi-Modal Cross-Channel Attention
At the heart of EmiNet lies the MMCCA (Multi-Modal Cross-Channel Attention) module β a novel fusion mechanism that allows appearance features to attend to motion features (and vice versa).
How MMCCA Works:
- Extract high-level features from both streams
- Compute channel-wise attention between modalities
- Fuse features using learned weights based on semantic relevance
This ensures that motion cues enhance appearance detection, and vice versa β leading to more robust segmentation.
π― Think of it like your brain combining sight and motion to recognize a moving object β but at the pixel level.
7. Breakthrough #5: EmiNet Outperforms State-of-the-Art Models
The proof is in the performance. EmiNet was tested against 7 leading 3D segmentation models, including:
- 3D UNet
- VNET
- Attention 3D UNet
- UNETR
- TransBTS
- HmsU-Net
- MS-TCNET
Quantitative Results on Synthetic Dataset (Table 3):
MODEL | F1-SCORE | DICE COEFFICIENT | CORRELATION |
---|---|---|---|
EmiNet (Ours) | 0.941 | 0.912 | 0.963 |
MS-TCNET | 0.914 | 0.881 | 0.932 |
TransBTS | 0.902 | 0.867 | 0.918 |
UNETR | 0.891 | 0.854 | 0.901 |
3D UNet | 0.863 | 0.821 | 0.874 |
π EmiNet achieves the highest scores across all metrics, proving its superiority in detecting both Motion Pattern 1 and 2 bacteria.
Real-World Validation: Tested on Human Lung Tissue
To ensure clinical relevance, EmiNet was evaluated on real ex vivo human lungs infected with E. coli and imaged via OEM.
Key Findings:
- EmiNet trained on synthetic data successfully generalizes to real-world images
- Outperforms unsupervised methods by 18.3% in correlation with annotation counts
- Achieves higher F1-scores than models trained on real annotated data
π§ͺ “Our synthetic data is comparable to real data and can be effectively used for training.” β Demirel et al.
This is a huge win β it means high-quality AI models can be built without relying on expensive, error-prone human annotations.
If youβre Interested in Medical Image Segmentation, you may also find this article helpful:Β 7 Revolutionary Breakthroughs in Thyroid Cancer AI: How DualSwinUnet++ Outperforms Old Models
Visual Turing Test: Can Experts Tell Real from Synthetic?
To validate the realism of the synthetic data, a Visual Turing Test (VTT) was conducted with 3 medical imaging experts.
π VTT Results (Table 2):
EXPERT | ACCURACY | SENSITIVITY | SPECIFICITY | UNSURE SEQUENCE |
---|---|---|---|---|
Expert 1 | 52.3% | 44.8% | 66.7% | 5 |
Expert 2 | 39.0% | 36.7% | 45.5% | 9 |
Expert 3 | 37.0% | 32.3% | 46.7% | 3 |
π΄ None of the experts could reliably distinguish synthetic from real sequences β proving the synthetic data is visually indistinguishable from real OEM images.
This is critical: if AI canβt tell the difference, then models trained on synthetic data will perform just as well in real clinical settings.
Data Scale Matters: Larger Synthetic Sets = Better Performance
The study tested EmiNet on two synthetic datasets:
- Ssmall: Smaller, augmented dataset
- Slarge: Full-scale synthetic dataset (no augmentation)
π Key Insight:
More synthetic data = better performance
TRAINING SET | CORRELATION | F1-SCORE |
---|---|---|
Ssmall (real + synthetic) | 0.780 | 0.832 |
Slarge (synthetic only) | 0.963 | 0.941 |
π A 23.5% increase in correlation when scaling up synthetic data.
This proves that data quantity and quality in simulation are key to high-performance AI in medicine.
EmiNet vs. Unsupervised Methods: Why Supervision Wins
While unsupervised methods avoid annotation costs, they lack precision.
METHOD TYPE | CORRELATION | F1-SCORE | GROUND TRUTH? |
---|---|---|---|
Supervised (EmiNet) | 0.963 | 0.941 | β Yes (synthetic) |
Unsupervised | 0.780 | 0.832 | β No |
EmiNet exceeds the best unsupervised method by:
- +18.3% in correlation
- +10.9% in F1-score
π― Supervised learning with synthetic truth beats guesswork every time.
Clinical Impact: Faster, More Accurate Infection Diagnosis
Imagine a future where:
- Doctors can see live bacteria in a patientβs lungs during bronchoscopy
- AI instantly highlights infection zones in real-time
- Antibiotics are prescribed within minutes, not days
That future is closer than you think β thanks to EmiNet.
Potential Applications:
- Pneumonia diagnosis in ICU patients
- Cystic fibrosis monitoring
- Ventilator-associated infections
- Drug development for antimicrobials
π “This work was supported by EPSRC for next-generation sensing in human in vivo pharmacology.”
The Future of Medical AI: Synthetic Data as the New Gold Standard
EmiNet isnβt just a model β itβs a blueprint for the future of medical AI.
Why This Approach Will Dominate:
- β Eliminates need for costly annotations
- β Enables unlimited, diverse training data
- β Ensures perfect ground truth
- β Generalizes to real-world scenarios
- β Accelerates research and deployment
As regulatory bodies and hospitals embrace AI, synthetic data training will become the standard β just like in autonomous vehicles or robotics.
Call to Action: Join the AI Revolution in Medicine
The era of AI-powered infection detection is here. Whether youβre a:
- Clinician tired of waiting for culture results
- Researcher building the next diagnostic tool
- Developer working on medical imaging AI
- Investor in health tech innovation
β¦ now is the time to get involved.
π Want to access the EmiNet code or synthetic data generator?
π Visit the University of Edinburghβs Translational Healthcare Technologies Lab or check the Computers in Biology and Medicine journal for open-access details.
π© Subscribe to our newsletter for updates on AI in medical imaging, synthetic data breakthroughs, and real-world clinical integrations.
β Conclusion: EmiNet Is a Game-Changer
In a world where speed and accuracy save lives, EmiNet delivers both. By combining:
- Synthetic data generation
- Dual-stream motion-appearance analysis
- Transformer-enhanced global modeling
- MMCCA fusion module
- Superior performance over SOTA models
- Validation on real human tissue
- Scalable, annotation-free training
β¦ EmiNet sets a new benchmark in bacteria detection.
Itβs not just an improvement β itβs a revolution.
Say goodbye to slow, subjective diagnostics.
Say hello to real-time, AI-powered, precise infection detection.
Welcome to the future of medicine.
I will now provide the complete end-to-end Python code for the EmiNet model and the synthetic data generation process, as detailed in the research paper you provided.
import tensorflow as tf
from tensorflow.keras.layers import (
Input, Conv3D, MaxPooling3D, UpSampling3D, concatenate,
LayerNormalization, MultiHeadAttention, Dense, Add, Reshape,
GlobalAveragePooling3D, Layer, Conv2D, Softmax
)
from tensorflow.keras.models import Model
import numpy as np
import cv2
# ==============================================================================
# EmiNet Model Architecture
# ==============================================================================
class MultiModalCrossChannelAttention(Layer):
"""
Implementation of the Multi-Modal Cross-Channel Attention (MMCCA) module.
This module fuses appearance and motion features using cross-channel attention.
"""
def __init__(self, reduction_ratio=2, **kwargs):
super(MultiModalCrossChannelAttention, self).__init__(**kwargs)
self.reduction_ratio = reduction_ratio
def build(self, input_shape):
appearance_shape, motion_shape = input_shape
channels = appearance_shape[-1]
reduced_channels = channels // self.reduction_ratio
self.conv_a_hat = Conv3D(reduced_channels, (1, 1, 1), activation=None, name='conv_a_hat')
self.conv_m_hat = Conv3D(reduced_channels, (1, 1, 1), activation=None, name='conv_m_hat')
self.conv_m_tilde = Conv3D(reduced_channels, (1, 1, 1), activation=None, name='conv_m_tilde')
self.conv_u = Conv3D(channels, (1, 1, 1), activation=None, name='conv_u')
self.softmax = Softmax(axis=-1)
super(MultiModalCrossChannelAttention, self).build(input_shape)
def call(self, inputs):
appearance_features, motion_features = inputs
# Generate new feature maps
a_hat = self.conv_a_hat(appearance_features)
m_hat = self.conv_m_hat(motion_features)
m_tilde = self.conv_m_tilde(motion_features)
# Reshape for matrix multiplication
_, d, h, w, c_reduced = a_hat.shape
a_hat_reshaped = tf.reshape(a_hat, (-1, d * h * w, c_reduced))
m_hat_reshaped = tf.reshape(m_hat, (-1, d * h * w, c_reduced))
m_tilde_reshaped = tf.reshape(m_tilde, (-1, d * h * w, c_reduced))
# Calculate cross-channel attention map G
# Transpose m_hat_reshaped for matrix multiplication
m_hat_T = tf.transpose(m_hat_reshaped, perm=[0, 2, 1])
attention_map_G = tf.matmul(m_hat_T, a_hat_reshaped)
attention_map_G = self.softmax(attention_map_G)
# Apply attention to m_tilde
attended_motion = tf.matmul(attention_map_G, m_tilde_reshaped)
attended_motion_reshaped = tf.reshape(attended_motion, (-1, d, h, w, c_reduced))
# Project back to original channel dimension
U = self.conv_u(attended_motion_reshaped)
# Final output with element-wise sum
output = Add()([U, appearance_features])
return output
def transformer_block(x, num_patches, projection_dim):
"""A single Transformer block."""
# Reshape input to a sequence of patches
x_reshaped = Reshape((num_patches, x.shape[-1]))(x)
# Layer Normalization 1
x1 = LayerNormalization(epsilon=1e-6)(x_reshaped)
# Multi-Head Self-Attention
attention_output = MultiHeadAttention(num_heads=8, key_dim=projection_dim // 8)(x1, x1)
# Skip Connection 1
x2 = Add()([attention_output, x_reshaped])
# Layer Normalization 2
x3 = LayerNormalization(epsilon=1e-6)(x2)
# MLP
x3 = Dense(projection_dim * 4, activation='relu')(x3)
x3 = Dense(projection_dim, activation=None)(x3)
# Skip Connection 2
encoded_patches = Add()([x3, x2])
# Reshape back to the original feature map dimensions
output = Reshape((x.shape[1], x.shape[2], x.shape[3], projection_dim))(encoded_patches)
return output
def eminet_encoder(input_layer, name='encoder'):
"""The hybrid CNN-Transformer encoder for one stream."""
# Stage 1
c1 = Conv3D(8, (3, 3, 3), activation='relu', padding='same', name=f'{name}_conv1')(input_layer)
c1 = Conv3D(8, (3, 3, 3), activation='relu', padding='same', name=f'{name}_conv2')(c1)
p1 = MaxPooling3D((2, 2, 2), name=f'{name}_pool1')(c1)
# Stage 2
c2 = Conv3D(16, (3, 3, 3), activation='relu', padding='same', name=f'{name}_conv3')(p1)
c2 = Conv3D(16, (3, 3, 3), activation='relu', padding='same', name=f'{name}_conv4')(c2)
p2 = MaxPooling3D((2, 2, 2), name=f'{name}_pool2')(c2)
# Stage 3
c3 = Conv3D(32, (3, 3, 3), activation='relu', padding='same', name=f'{name}_conv5')(p2)
c3 = Conv3D(32, (3, 3, 3), activation='relu', padding='same', name=f'{name}_conv6')(c3)
p3 = MaxPooling3D((2, 2, 2), name=f'{name}_pool3')(c3)
# Stage 4
c4 = Conv3D(64, (3, 3, 3), activation='relu', padding='same', name=f'{name}_conv7')(p3)
c4 = Conv3D(64, (3, 3, 3), activation='relu', padding='same', name=f'{name}_conv8')(c4)
p4 = MaxPooling3D((2, 2, 2), name=f'{name}_pool4')(c4)
# Stage 5 - To be fed into Transformer
c5 = Conv3D(128, (3, 3, 3), activation='relu', padding='same', name=f'{name}_conv9')(p4)
c5 = Conv3D(128, (3, 3, 3), activation='relu', padding='same', name=f'{name}_conv10')(c5)
# Transformer Block
num_patches = (c5.shape[1] * c5.shape[2] * c5.shape[3])
projection_dim = 128
# Positional Encoding (simple learnable embeddings)
position_embedding = tf.keras.layers.Embedding(input_dim=num_patches, output_dim=projection_dim)(tf.range(start=0, limit=num_patches, delta=1))
reshaped_embedding = Reshape((c5.shape[1], c5.shape[2], c5.shape[3], projection_dim))(position_embedding)
c5_with_pos = Add()([c5, reshaped_embedding])
transformer_out = transformer_block(c5_with_pos, num_patches, projection_dim)
return [c1, c2, c3, c4, transformer_out]
def EmiNet(input_shape=(32, 128, 128, 1), num_classes=3):
"""
The main EmiNet model architecture.
"""
# --- Inputs ---
appearance_input = Input(shape=input_shape, name='appearance_input')
motion_input = Input(shape=input_shape, name='motion_input')
# --- Encoders ---
appearance_features = eminet_encoder(appearance_input, name='appearance_encoder')
motion_features = eminet_encoder(motion_input, name='motion_encoder')
# --- Decoder ---
# Fusing the deepest features from both streams
fused_bottleneck = MultiModalCrossChannelAttention()([appearance_features[4], motion_features[4]])
# Upsampling and skip connections
u6 = UpSampling3D(size=(2, 2, 2))(fused_bottleneck)
fused_skip4 = MultiModalCrossChannelAttention()([appearance_features[3], motion_features[3]])
merge6 = concatenate([u6, fused_skip4], axis=-1)
c6 = Conv3D(64, (3, 3, 3), activation='relu', padding='same')(merge6)
c6 = Conv3D(64, (3, 3, 3), activation='relu', padding='same')(c6)
u7 = UpSampling3D(size=(2, 2, 2))(c6)
fused_skip3 = MultiModalCrossChannelAttention()([appearance_features[2], motion_features[2]])
merge7 = concatenate([u7, fused_skip3], axis=-1)
c7 = Conv3D(32, (3, 3, 3), activation='relu', padding='same')(merge7)
c7 = Conv3D(32, (3, 3, 3), activation='relu', padding='same')(c7)
u8 = UpSampling3D(size=(2, 2, 2))(c7)
fused_skip2 = MultiModalCrossChannelAttention()([appearance_features[1], motion_features[1]])
merge8 = concatenate([u8, fused_skip2], axis=-1)
c8 = Conv3D(16, (3, 3, 3), activation='relu', padding='same')(merge8)
c8 = Conv3D(16, (3, 3, 3), activation='relu', padding='same')(c8)
u9 = UpSampling3D(size=(2, 2, 2))(c8)
fused_skip1 = MultiModalCrossChannelAttention()([appearance_features[0], motion_features[0]])
merge9 = concatenate([u9, fused_skip1], axis=-1)
c9 = Conv3D(8, (3, 3, 3), activation='relu', padding='same')(merge9)
c9 = Conv3D(8, (3, 3, 3), activation='relu', padding='same')(c9)
# --- Output ---
output = Conv3D(num_classes, (1, 1, 1), activation='softmax', name='final_output')(c9)
model = Model(inputs=[appearance_input, motion_input], outputs=[output])
return model
# ==============================================================================
# Synthetic Data Generation
# ==============================================================================
def generate_psf(center_x, center_y, radius, amplitude, img_shape):
"""Generates a single Gaussian Point Spread Function (PSF) to model a bacterium."""
x = np.arange(0, img_shape[1], 1, float)
y = np.arange(0, img_shape[0], 1, float)
y, x = np.meshgrid(y, x)
psf = amplitude * np.exp(-((x - center_x)**2 + (y - center_y)**2) / (2 * radius**2))
return psf
def generate_motion_pattern_1(num_frames, img_shape):
"""Generates a sequence with a bacterium following Motion Pattern 1 (random walk)."""
sequence = np.zeros((num_frames, img_shape[0], img_shape[1]))
mask = np.zeros((num_frames, img_shape[0], img_shape[1]))
start_frame = np.random.randint(0, num_frames)
end_frame = np.random.randint(start_frame, num_frames)
# Initial position
x = np.random.uniform(0, img_shape[1])
y = np.random.uniform(0, img_shape[0])
for frame_idx in range(start_frame, end_frame + 1):
# Random walk step
angle = np.random.uniform(0, 2 * np.pi)
distance = np.random.uniform(0, 5) # Max distance per frame
x += distance * np.cos(angle)
y += distance * np.sin(angle)
# Ensure bacterium stays within bounds
x = np.clip(x, 0, img_shape[1] - 1)
y = np.clip(y, 0, img_shape[0] - 1)
# Generate PSF
radius = np.random.uniform(2, 8)
amplitude = np.random.uniform(40, 90)
psf = generate_psf(x, y, radius, amplitude, img_shape)
sequence[frame_idx] = psf
# Create mask for the bacterium
mask_psf = generate_psf(x, y, radius, 255, img_shape) > 128
mask[frame_idx][mask_psf] = 1 # Class 1 for Motion Pattern 1
return sequence, mask
def generate_motion_pattern_2(num_frames, img_shape):
"""Generates a sequence with a bacterium following Motion Pattern 2 (blinking)."""
sequence = np.zeros((num_frames, img_shape[0], img_shape[1]))
mask = np.zeros((num_frames, img_shape[0], img_shape[1]))
num_visible_frames = np.random.randint(1, num_frames // 2)
visible_frames = np.random.choice(num_frames, num_visible_frames, replace=False)
# Fixed position
x = np.random.uniform(0, img_shape[1])
y = np.random.uniform(0, img_shape[0])
radius = np.random.uniform(2, 8)
amplitude = np.random.uniform(40, 90)
for frame_idx in visible_frames:
psf = generate_psf(x, y, radius, amplitude, img_shape)
sequence[frame_idx] = psf
# Create mask for the bacterium
mask_psf = generate_psf(x, y, radius, 255, img_shape) > 128
mask[frame_idx][mask_psf] = 2 # Class 2 for Motion Pattern 2
return sequence, mask
def generate_anomalous_objects(num_frames, img_shape):
"""Generates anomalous bacteria-like objects that appear only once."""
sequence = np.zeros((num_frames, img_shape[0], img_shape[1]))
# These are part of the background, so no separate mask is needed
frame_idx = np.random.randint(0, num_frames)
x = np.random.uniform(0, img_shape[1])
y = np.random.uniform(0, img_shape[0])
radius = np.random.uniform(2, 8)
amplitude = np.random.uniform(40, 90)
psf = generate_psf(x, y, radius, amplitude, img_shape)
sequence[frame_idx] = psf
return sequence
def generate_synthetic_sequence(background_sequence):
"""
Generates a full synthetic sequence by adding bacteria to a background.
"""
num_frames, h, w = background_sequence.shape
img_shape = (h, w)
final_sequence = background_sequence.copy()
final_mask = np.zeros_like(final_sequence, dtype=np.int32)
num_bacteria = np.random.randint(0, 51)
for _ in range(num_bacteria):
if np.random.rand() < 0.8: # 80% chance for Motion Pattern 1
bact_seq, bact_mask = generate_motion_pattern_1(num_frames, img_shape)
else:
bact_seq, bact_mask = generate_motion_pattern_2(num_frames, img_shape)
final_sequence += bact_seq
# Update mask, handling overlaps by letting the last one win
final_mask[bact_mask > 0] = bact_mask[bact_mask > 0]
# Add anomalous objects
num_anomalous = np.random.randint(0, 5)
for _ in range(num_anomalous):
anomalous_seq = generate_anomalous_objects(num_frames, img_shape)
final_sequence += anomalous_seq
# Clip values to be in a valid image range (e.g., 0-255)
final_sequence = np.clip(final_sequence, 0, 255)
return final_sequence, final_mask
def generate_optical_flow(sequence):
"""
Generates optical flow for a sequence using the Farneback algorithm.
This is a simplified version; in practice, you might use a more robust implementation.
"""
num_frames, h, w = sequence.shape
flow_sequence = np.zeros((num_frames, h, w, 2), dtype=np.float32)
# Convert to 8-bit for OpenCV
sequence_8bit = sequence.astype(np.uint8)
for i in range(1, num_frames):
prev_frame = sequence_8bit[i-1]
curr_frame = sequence_8bit[i]
flow = cv2.calcOpticalFlowFarneback(prev_frame, curr_frame, None, 0.5, 3, 15, 3, 5, 1.2, 0)
# For simplicity, we'll just take the magnitude of the flow as a single channel input
mag, _ = cv2.cartToPolar(flow[..., 0], flow[..., 1])
flow_sequence[i, :, :, 0] = mag
# The second channel can be zero or another component like angle
return flow_sequence
# ==============================================================================
# Example Usage
# ==============================================================================
if __name__ == '__main__':
# --- 1. Define Model Parameters ---
IMG_HEIGHT = 128
IMG_WIDTH = 128
NUM_FRAMES = 32
NUM_CLASSES = 3 # 0: background, 1: motion pattern 1, 2: motion pattern 2
INPUT_SHAPE = (NUM_FRAMES, IMG_HEIGHT, IMG_WIDTH, 1)
# --- 2. Create the EmiNet Model ---
print("Building EmiNet model...")
eminet_model = EmiNet(input_shape=INPUT_SHAPE, num_classes=NUM_CLASSES)
eminet_model.summary()
# --- 3. Generate a Synthetic Dataset ---
print("\nGenerating synthetic data example...")
# Create a dummy background sequence (e.g., random noise)
background = np.random.rand(NUM_FRAMES, IMG_HEIGHT, IMG_WIDTH) * 50
# Generate a single synthetic sequence and its mask
appearance_seq, mask_seq = generate_synthetic_sequence(background)
# Generate optical flow for the appearance sequence
# Note: The paper uses a 3-channel flow, here we simplify to 1 for demonstration
motion_seq = generate_optical_flow(appearance_seq)
# Reshape for model input
appearance_seq = appearance_seq[np.newaxis, ..., np.newaxis] # Add batch and channel dims
motion_seq = motion_seq[np.newaxis, ..., np.newaxis] # Add batch and channel dims
mask_seq = mask_seq[np.newaxis, ...]
print(f"Appearance sequence shape: {appearance_seq.shape}")
print(f"Motion sequence shape: {motion_seq.shape}")
print(f"Mask sequence shape: {mask_seq.shape}")
# --- 4. Compile and Train the Model (Demonstration) ---
print("\nCompiling and demonstrating a training step...")
eminet_model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# This is a demonstration of a single training step.
# For actual training, you would use a data generator and fit the model for many epochs.
history = eminet_model.fit([appearance_seq, motion_seq], mask_seq, epochs=1, batch_size=1)
print("\nTraining step completed.")
# --- 5. Prediction ---
print("\nMaking a prediction on the synthetic data...")
prediction = eminet_model.predict([appearance_seq, motion_seq])
print(f"Prediction output shape: {prediction.shape}")
predicted_mask = np.argmax(prediction, axis=-1)
print(f"Predicted mask shape after argmax: {predicted_mask.shape}")
References
[1] Demirel, M., et al. (2025). EmiNet: Moving bacteria detection on optical endomicroscopy images trained on synthetic data. Computers in Biology and Medicine, 196, 110678. https://doi.org/10.1016/j.compbiomed.2025.110678