EDIP-Net: Enhanced Deep Image Prior for Unsupervised Hyperspectral Super-Resolution | AI Trend Blend

Remote Sensing · Hyperspectral AI · IEEE Transactions on Geoscience and Remote Sensing, Vol. 63 (2025) · 20 min read

EDIP-Net: What Happens When You Stop Feeding Random Noise to Deep Image Prior

Researchers at the Chinese Academy of Sciences identified a deceptively simple flaw in every DIP-based hyperspectral fusion method published before them — the random noise input — and replaced it with two scene-aware coarse estimations derived entirely from the observations themselves. The result topped nine competing methods across four benchmarks without touching a single external training image.

EDIP-Net Deep Image Prior Hyperspectral Super-Resolution Zero-Shot Learning Unsupervised Double U-Shape Degradation-Aware Fusion Remote Sensing Spectral-Spatial Fusion

Flowchart of the proposed EDIP-Net built with a two-stage four-component scheme.

Hyperspectral cameras see the world in extraordinary detail — hundreds of spectral bands from visible light all the way into the infrared, each one a narrow window onto the chemical fingerprint of a scene. The catch is that capturing all that spectral richness leaves almost no room for spatial resolution. You can have a finely detailed spatial map, or a finely detailed spectral profile, but not both at once — not from a single sensor. Hyperspectral image super-resolution (HISR) is the field that tries to fix this by fusing a spatially sharp multispectral image with a spectrally rich hyperspectral one. EDIP-Net represents one of the most careful rethinks of how to do this without any external training data, and it wins convincingly.

The Two Camps, and Why Both Have a Problem

If you want to reconstruct a high-resolution hyperspectral image (HrHSI) from a low-resolution hyperspectral image (LrHSI) and a high-resolution multispectral image (HrMSI), you are essentially solving an underdetermined inverse problem. The HrHSI has both high spatial and high spectral resolution. Neither input has that combination — that is the whole point of the task.

The dominant approach for the last several years has been supervised deep learning. Train a network on thousands of (LrHSI, HrMSI, HrHSI) triplets, let it learn what a plausible HrHSI looks like, and apply it to new images. The results are impressive. The problem is the prerequisite: you need that paired dataset to exist, it needs to match your test conditions in terms of sensor, spectral range, and imaging environment, and if any of those conditions change, performance collapses. Remote sensing data does not come with guarantees of consistency across sensors, platforms, or acquisition conditions.

The alternative is unsupervised learning — specifically, the deep image prior (DIP) framework, originally proposed by Lempitsky et al. at CVPR 2018. The insight behind DIP is elegant: the architecture of a convolutional network is itself a prior over natural images. If you optimise a randomly initialised network to reproduce your degraded observation, the network’s own structure prevents it from fitting noise before it fits the underlying signal. You never need external data. You just need the image.

But DIP-based HISR methods have been quietly hamstrung by two design choices that nobody had directly challenged. First, they all initialise the generator with random noise. Random noise tells the network nothing about the scene it is trying to reconstruct — no spectral characteristics, no spatial structure, no texture. The network has to figure everything out from gradient descent alone, which is slow and error-prone. Second, the generator architectures are hand-designed for the task in a generic way, without architectural choices specifically motivated by hyperspectral image properties.

The Central Observation

Every DIP-based HISR method before EDIP-Net handed the generator a blank canvas — random noise — and asked it to reconstruct a complex hyperspectral scene. EDIP-Net asks a more productive question: can we extract coarse but scene-specific estimates directly from the observations themselves, and use those as the generator’s starting point? The answer turns out to be yes, and the gains are large.

The Problem Setup: What We Have and What We Want

The mathematical setup is standard for HISR. Given a low-resolution hyperspectral image Y ∈ R^h×w×C and a high-resolution multispectral image Z ∈ R^H×W×c, the goal is to reconstruct the unknown HrHSI X ∈ R^H×W×C. The two observations are related to the unknown target through a linear degradation model:

Eq. 1 — Degradation Model $$Y = PX + N_y, \qquad Z = XS + N_z$$

Here P ∈ R^wh×WH is the spatial degradation matrix — a combined point spread function (PSF) blur and downsampling operator — and S ∈ R^B×c is the spectral response function (SRF) of the multispectral sensor. N_y and N_z are independent Gaussian noise terms. The scale factor r = H/h = W/w, and typically C ≫ c (many hyperspectral bands, few multispectral bands).

The traditional DIP formulation replaces an explicit regularizer R(X) with the implicit prior of a generator network G_θ, leading to the optimization:

Eq. 2 — DIP Formulation $$\min_{\theta} \|Z – XS\|_F^2 + \|Y – PX\|_F^2 \quad \text{s.t.} \quad X = G_\theta(E)$$

The key quantity here is E — the input to the generator. In every prior DIP-based HISR method, E is random noise. In EDIP-Net, E is replaced with two scene-specific coarse estimates derived via zero-shot learning. That single change, and the architectural design that supports it, is the entire contribution.

Stage One: Zero-Shot Learning — Getting Smart Inputs for Free

The ZSL stage is elegant because it exploits a physical relationship that was always hiding in the degradation model but nobody had used for input generation. The key observation is a cross-scale spectral relationship: if you apply the spectral degradation S to Y (producing a spatially-low, spectrally-low resolution image K₁), and separately apply the spatial degradation P to Z (producing a spatially-low, spectrally-low resolution image K₂), those two outputs should be identical in the absence of noise:

Eq. 3 — Cross-Scale Spectral Relationship $$K_1 = YS, \quad K_2 = PZ, \quad K_1 = K_2$$

This equivalence is not a coincidence — it follows directly from both Y and Z being degraded versions of the same HrHSI X. And it is exactly what is needed to build training pairs from the observations themselves, with no external data.

Step 1: The Degradation Learning Network

Before we can apply the cross-scale relationship, we need to know P and S. In real applications these parameters are often unavailable, so EDIP-Net learns them adaptively. The spectral degradation S is modelled as a pointwise (1×1) convolutional layer without bias — physically correct, since spectral degradation is a weighted sum along the wavelength dimension. The spatial degradation P is modelled with a depthwise convolution, which correctly applies a shared blur kernel to each spectral band independently without cross-band interference. Both are estimated jointly by minimising:

Eq. 4 — Degradation Loss $$\mathcal{L}_1 = \|K_1 – K_2\|_1$$

Physical constraints — non-negativity and sum-to-one for the SRF — are imposed on the estimated parameters during training. The ablation results show that the estimated parameters are close enough to the true ones that the performance difference between known and estimated degradation is not statistically significant on the Houston benchmark.

Step 2: The Interactive Spectral Learning Network

With K₁ and K₂ in hand, the network has the training pairs it needs: the input is a pair of low-resolution MSI images, and the target is the observed LrHSI Y. A two-stream architecture processes K₁ and K₂ simultaneously, with each stream progressively increasing spectral band count through cascaded upsampling blocks built on the Res2Net module. The two streams interact at each upsampling step — each stream adds features from the other before processing, enabling cross-stream information flow:

Eq. 5 — Interactive Spectral Learning $$Y_1, Y_2 = F_s(K_1, K_2), \quad \mathcal{L}_2 = \|Y – Y_1\|_1 + \|Y – Y_2\|_1$$

Once trained on these internal pairs, the network is applied to the original HrMSI Z (using it as both inputs, since Z already has the right spatial resolution) to produce two coarse HrHSI estimates X₁ and X₂. These are imperfect — their metric scores are well below the final result — but they contain the spectral identity and spatial structure of the actual scene, which is precisely what random noise lacks. They are the informed starting points that the DIG stage then refines.

INPUTS: LrHSI Y (h×w×C)  ·  HrMSI Z (H×W×c)
         │                        │
         ▼                        ▼
┌─────────────────────────────────────────────────┐  ZSL STAGE
│  STEP 1 — Degradation Learning Network          │
│  ┌──────────────┐    ┌─────────────────────┐    │
│  │ SRF (1×1 pw) │    │ PSF (depthwise conv)│    │
│  └──────┬───────┘    └──────────┬──────────┘    │
│         │                       │               │
│    K₁ = YS               K₂ = PZ               │
│         └─────────┐  ┌──────────┘               │
│                   ▼  ▼                           │
│              min ‖K₁ − K₂‖₁  (L₁)              │
│  ─────────────────────────────────────────────  │
│  STEP 2 — Interactive Spectral Learning         │
│  Stream-1: K₁ → Upsample → ⊕ → Upsample → Y₁  │
│                              ↕   (cross-add)    │
│  Stream-2: K₂ → Upsample → ⊕ → Upsample → Y₂  │
│              min ‖Y−Y₁‖₁ + ‖Y−Y₂‖₁  (L₂)      │
│  ─────────────────────────────────────────────  │
│  STEP 3 — Coarse Estimation Generation          │
│  Apply trained F_s to (Z, Z) → X₁, X₂          │
│  (scene-aware coarse HrHSI estimates)           │
└─────────────────────────────────────────────────┘
         │ X₁             │ X₂
         ▼                ▼
┌─────────────────────────────────────────────────┐  DIG STAGE
│  Double U-Shape Network (two independent U-Nets)│
│  Stream 1: Enc(X₁) → Dec → X̂₁                 │
│  Stream 2: Enc(X₂) → Dec → X̂₂                 │
│  Loss: ‖Y − P(X̂₁)‖₁ + ‖Z − X̂₁S‖₁            │
│       + ‖Y − P(X̂₂)‖₁ + ‖Z − X̂₂S‖₁  (L₃)    │
│  ─────────────────────────────────────────────  │
│  Degradation-Aware Decision Fusion              │
│  Z₁ = X̂₁S,  Z₂ = X̂₂S                         │
│  M₁ = RMSE(Z₁,Z),  M₂ = RMSE(Z₂,Z)           │
│  B(i,j) = argmin pixel-wise error               │
│  X̂(i,j) = X̂₁ if B=1 else X̂₂                 │
└─────────────────────────────────────────────────┘
         │
    FINAL HrHSI X̂  (H×W×C)

Stage Two: Deep Image Generation — Learning the Prior

The Double U-Shape Network

With X₁ and X₂ as inputs, the DIG stage runs two independent U-Net encoder-decoder streams in parallel, each capturing the hyperspectral prior of one coarse estimation. The architecture follows the standard U-Net pattern with five blocks per stream — three encoding blocks with average pooling downsampling and two decoding blocks with bilinear upsampling — but with one thoughtful modification to the skip connections.

Traditional U-Net skip connections directly concatenate encoded and decoded features. EDIP-Net inserts a dedicated skip block before each concatenation. This skip block uses a 1×1 convolution to reduce the channel count before fusion, which does two things simultaneously: it reduces the parameter count (important for an unsupervised model that is being optimised on a single image), and it prevents the skip path from overwhelming the decoder with raw unprocessed encoder features. The encoder blocks use 5×5 convolutions followed by batch normalisation and LeakyReLU activation — larger kernels than the typical 3×3 to capture longer-range spectral correlations.

The loss for the DIG stage enforces consistency with the degradation model — the generated outputs, when spectrally and spatially degraded, should match the observed HrMSI and LrHSI respectively:

Eq. 6 — DIG Loss $$\mathcal{L}_3 = \underbrace{\|Y – F_z(\hat{X}_1; \theta^{(PSF)})\|_1 + \|Z – F_y(\hat{X}_1; \theta^{(SRF)})\|_1}_{\hat{X}_1} + \underbrace{\|Y – F_z(\hat{X}_2; \theta^{(PSF)})\|_1 + \|Z – F_y(\hat{X}_2; \theta^{(SRF)})\|_1}_{\hat{X}_2}$$

The degradation parameters θ^(PSF) and θ^(SRF) estimated in the ZSL stage are frozen here and used directly — they are not re-estimated. This is a deliberate design choice that prevents the DIG stage from overfitting to loose degradation parameter estimates.

Degradation-Aware Decision Fusion

Figure 1: Flowchart of the proposed degradation-aware decision fusion strategy.

After optimisation, the double U-shape network produces two HrHSI candidates X̂₁ and X̂₂. Simply averaging them — the naive approach — loses useful information when the two candidates differ locally in quality. Different regions of a scene may be better reconstructed by one stream or the other, and this pattern varies spatially.

The fusion strategy exploits the same physical relationship used throughout the paper: a reconstruction is good if and only if its spectral degradation matches the observed HrMSI. So the authors degrade both candidates spectrally, compute per-pixel RMSE maps against the ground-truth HrMSI, and create a binary selection mask that picks the better candidate at each pixel location:

Eq. 7–8 — Pixel-Wise Fusion $$Z_1 = F_y(\hat{X}_1;\,\theta^{(SRF)}), \quad Z_2 = F_y(\hat{X}_2;\,\theta^{(SRF)})$$ $$B(i,j) = \begin{cases}1 & M_1(i,j) < M_2(i,j)\\ 0 & \text{otherwise}\end{cases}, \quad \hat{X}(i,j) = \begin{cases}\hat{X}_1(i,j) & B(i,j)=1\\ \hat{X}_2(i,j) & B(i,j)=0\end{cases}$$

This is an unusually clean fusion strategy for the field. It requires no additional learned components, no hyperparameters, and is grounded in a physically meaningful criterion. The ablation in Table IV of the paper confirms it works: the fused output X̂ outperforms both candidates on every metric, with a particularly large PSNR improvement, validating the intuition that different spatial regions genuinely benefit from different reconstructions.

Why Two Streams Beat One

The two coarse estimates X₁ and X₂ from the ZSL stage are produced by the same spectral learning network but starting from different LrMSI inputs (K₁ and K₂). Their differences encode complementary views of the scene’s spectral structure. Running two independent U-Net priors on these distinct inputs and then selecting the better result pixel-by-pixel consistently outperforms any single-stream approach, because reconstruction quality is not spatially uniform — it depends on local contrast, texture, and spectral diversity.

Results: A Clean Sweep Across Four Benchmarks

EDIP-Net is evaluated on four public hyperspectral datasets — Houston, Washington DC Mall, TianGong-1 (TG), and Chikusei — against nine competing methods covering both model-based (G-SOMP+, CSU, CNMF, STEREO, CSTF, SCOTT) and deep learning-based (ADASR, MIAE, SURE) approaches. All input HrMSI-LrHSI pairs are simulated from the ground truth using an 8-band WorldView-2 SRF and a Gaussian PSF. Six quantitative metrics are reported: SAM, PSNR, ERGAS, CC, RMSE, and UIQI.

Houston Dataset

Method	Category	SAM ↓	PSNR ↑	ERGAS ↓	CC ↑	RMSE ↓	UIQI ↑
G-SOMP+	Model	1.7447	37.4478	0.5945	0.9984	0.0092	0.9971
CNMF	Model	1.5375	38.3817	0.5346	0.9987	0.0083	0.9960
CSTF	Model	0.9713	45.9991	0.2531	0.9991	0.0037	0.9996
ADASR	DL	1.1706	46.3849	0.3005	0.9987	0.0041	0.9994
MIAE	DL	0.9002	48.5371	0.2516	0.9993	0.0033	0.9996
SURE	DL	0.8681	48.6378	0.2164	0.9993	0.0030	0.9997
EDIP-Net (Ours)	DL (Unsup.)	0.7928	49.3325	0.2010	0.9994	0.0028	0.9998

Table 1 (abridged): Houston dataset results. EDIP-Net achieves best scores across all six metrics, including a +0.69 dB PSNR advantage over the second-best DL method (SURE), despite using zero external training data.

Chikusei and TianGong-1 Highlights

Method	Dataset	SAM ↓	PSNR ↑	ERGAS ↓	CC ↑
SURE	TianGong-1	1.2525	47.0820	0.2123	0.9947
EDIP-Net	TianGong-1	0.9543	50.0133	0.1528	0.9968
SURE	Chikusei	1.4839	46.4847	0.3720	0.9958
EDIP-Net	Chikusei	1.3322	48.5270	0.3168	0.9967

Table 2 (abridged): TianGong-1 and Chikusei results. The TG PSNR lead of nearly 3 dB over SURE — the next-best unsupervised competitor — is the largest quantitative advantage across all four datasets.

The lead is consistent across all four datasets and all six metrics. It is worth emphasising what this means practically: EDIP-Net never sees anything outside the two input images it is trying to fuse. No training set. No pre-trained weights. No auxiliary images. And it still outperforms supervised methods that were trained on large external datasets. The scale factor robustness experiments (testing at 8, 10, 16, and 20) show that EDIP-Net degrades more gracefully than its competitors as spatial upscaling gets more extreme — a direct consequence of better prior estimation from richer inputs.

“By replacing random noise with two estimations, we design a double U-shape architecture for the generator network to capture their hyperspectral prior, each independently generating one HrHSI candidate.” — Li, Zheng, Gao, Han, Li & Chanussot, IEEE TGRS (2025)

Where EDIP-Net Has Room to Grow

The authors are transparent about two limitations. The first is noise sensitivity — because the model is trained purely on the test images, any noise that appears in the observations gets baked into the training signal. Under clean imaging conditions this is not a problem, but noisy satellite data can lead to degraded reconstructions, as the real-world Liao Ning-01 experiments reveal. The model produces good results at epoch 800 of training but begins to over-fit to noise patterns as training continues, eventually producing hazy artefacts. Early stopping based on visual quality is currently a manual decision.

The second limitation is computational cost. Running two full U-Net streams sequentially, preceded by two separate network training procedures in the ZSL stage, is substantially heavier than single-stream alternatives. Table IX in the paper puts the running time at 5,731 seconds on the TG dataset for the double U-shape network alone — roughly 95 minutes. SURE completes the same task in 33,458 seconds (notably slower), while MIAE and ADASR are faster at 730 and 265 seconds respectively. The computational overhead is real, but the authors position lightweight model design as explicit future work.

Practical Takeaway

EDIP-Net is the right choice when you have a one-shot or few-shot fusion scenario — a single HrMSI-LrHSI pair with no matching training data, potentially from a satellite sensor not covered by any existing training corpus. Its quality advantage over the best supervised competitors is real and consistent. If you are operating under tight compute constraints or processing noisy data, MIAE or SURE may be more practical choices pending future lightweight variants of EDIP-Net.

Complete End-to-End EDIP-Net Implementation (PyTorch)

The implementation below is a complete, syntactically verified PyTorch translation of EDIP-Net, covering all four components of the two-stage framework described in the paper: the degradation learning network (PSF + SRF estimation), the interactive spectral learning network (two-stream Res2Net-based spectral upsampling), the double U-shape generator network, and the degradation-aware decision fusion strategy. Dataset helpers for Houston, Washington DC Mall, TianGong-1, and Chikusei, a full training loop following Algorithm 1, and a smoke test are all included. No external training data is required — the model trains entirely on the provided HrMSI-LrHSI pair.

# ==============================================================================
# EDIP-Net: Enhanced Deep Image Prior for Unsupervised HSI Super-Resolution
# Paper: IEEE TGRS Vol. 63, 2025 | DOI: 10.1109/TGRS.2025.3531646
# Authors: Jiaxin Li, Ke Zheng, Lianru Gao, Zhu Han, Zhi Li, Jocelyn Chanussot
# ==============================================================================
# Complete end-to-end PyTorch implementation.
# Sections:
#   1.  Imports & Configuration
#   2.  Degradation Learning Network (PSF + SRF estimation)
#   3.  Interactive Spectral Learning Network (ZSL coarse estimation)
#   4.  Double U-Shape Generator Network (DIG stage)
#   5.  Degradation-Aware Decision Fusion
#   6.  Full EDIP-Net Pipeline
#   7.  Training Loop (Algorithm 1)
#   8.  Evaluation Metrics
#   9.  Dataset Helpers
#  10.  Smoke Test
# ==============================================================================

from __future__ import annotations

import math
import warnings
from typing import Tuple, Optional, Dict

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch import Tensor

warnings.filterwarnings("ignore")


# ─── SECTION 1: Configuration ─────────────────────────────────────────────────

class EDIPConfig:
    """
    Hyperparameters for EDIP-Net.

    Attributes
    ----------
    lrHSI_bands   : spectral bands in LrHSI  (C, e.g. 46 for Houston)
    hrMSI_bands   : spectral bands in HrMSI  (c, e.g. 8 for WorldView-2)
    scale_factor  : spatial upscaling factor  r = H/h = W/w
    psf_size      : size of PSF kernel (e.g. 8 for 8×8 Gaussian)
    mid_ch        : internal channels for the U-shape network
    n_spectral_up : number of spectral upsampling blocks in ISLN
    lr_deg        : learning rate for degradation learning network
    lr_spectral   : learning rate for interactive spectral learning network
    lr_gen        : learning rate for double U-shape generator
    epochs_deg    : training epochs for degradation learning network
    epochs_spec   : training epochs for spectral learning network
    epochs_gen    : training epochs for generator network
    """
    lrHSI_bands: int = 46
    hrMSI_bands: int = 8
    scale_factor: int = 8
    psf_size: int = 8
    mid_ch: int = 128
    n_spectral_up: int = 4
    lr_deg: float = 1e-3
    lr_spectral: float = 4e-3
    lr_gen: float = 4e-3
    epochs_deg: int = 2000
    epochs_spec: int = 2000
    epochs_gen: int = 7000

    def __init__(self, **kwargs):
        for k, v in kwargs.items():
            setattr(self, k, v)


# ─── SECTION 2: Degradation Learning Network ──────────────────────────────────

class SRFLayer(nn.Module):
    """
    Spectral Response Function (SRF) estimation layer.

    Models spectral degradation as a pointwise (1×1) convolution without bias.
    Each output channel is a learned weighted sum over all input spectral bands,
    which is physically equivalent to integrating radiance over spectral ranges.

    Constraints: non-negativity and sum-to-one applied via softmax projection.

    Parameters
    ----------
    in_bands  : number of input spectral bands (C, from LrHSI)
    out_bands : number of output spectral bands (c, multispectral sensor)
    """

    def __init__(self, in_bands: int, out_bands: int):
        super().__init__()
        self.conv = nn.Conv2d(in_bands, out_bands, kernel_size=1, bias=False)
        nn.init.xavier_uniform_(self.conv.weight)

    def get_constrained_weight(self) -> Tensor:
        """Apply non-negativity + sum-to-one constraints via softmax."""
        # weight shape: (out_bands, in_bands, 1, 1)
        w = self.conv.weight.squeeze(-1).squeeze(-1)   # (out, in)
        return F.softmax(w, dim=1)

    def forward(self, x: Tensor) -> Tensor:
        """x: (B, in_bands, H, W) → (B, out_bands, H, W)"""
        w = self.get_constrained_weight().unsqueeze(-1).unsqueeze(-1)
        return F.conv2d(x, w)


class PSFLayer(nn.Module):
    """
    Point Spread Function (PSF) estimation + downsampling layer.

    Models spatial degradation as a depthwise convolution (one shared blur kernel
    applied independently to each spectral band), followed by stride-based
    downsampling. This correctly reflects that blurring does not mix spectral bands.

    Constraints: non-negativity + sum-to-one applied via softmax projection.

    Parameters
    ----------
    bands        : number of spectral bands (applied depthwise)
    psf_size     : kernel size for the PSF (e.g. 8 for 8×8 Gaussian)
    scale_factor : spatial downsampling factor
    """

    def __init__(self, bands: int, psf_size: int = 8, scale_factor: int = 8):
        super().__init__()
        self.bands = bands
        self.psf_size = psf_size
        self.scale = scale_factor
        self.pad = psf_size // 2
        # Single PSF kernel shared across all bands (depthwise convolution)
        self.psf_kernel = nn.Parameter(torch.ones(1, 1, psf_size, psf_size) / (psf_size * psf_size))

    def get_constrained_kernel(self) -> Tensor:
        """Apply non-negativity + sum-to-one constraints via softmax."""
        k = self.psf_kernel.reshape(1, -1)
        k = F.softmax(k, dim=1).reshape(self.psf_kernel.shape)
        return k

    def forward(self, x: Tensor) -> Tensor:
        """
        x: (B, C, H, W) HrHSI-resolution input → (B, C, H/r, W/r) blurred + downsampled
        """
        B, C, H, W = x.shape
        k = self.get_constrained_kernel()
        # Expand kernel for depthwise conv: (C, 1, psf_size, psf_size)
        k_dw = k.expand(C, 1, self.psf_size, self.psf_size)
        blurred = F.conv2d(x, k_dw, padding=self.pad, groups=C)
        # Downsample with average pooling to match LrHSI spatial resolution
        downsampled = F.avg_pool2d(blurred, kernel_size=self.scale, stride=self.scale)
        return downsampled


class DegradationLearningNet(nn.Module):
    """
    Degradation Learning Network (Section IV-B.2 of paper).

    Learns PSF and SRF parameters simultaneously by minimising the L1 difference
    between:
      K₁ = SRF(Y)   — spectrally degraded LrHSI
      K₂ = PSF(Z)   — spatially degraded HrMSI

    Per Eq. 4: L₁ = ‖K₁ − K₂‖₁

    Parameters
    ----------
    lrHSI_bands : C — spectral bands in LrHSI
    hrMSI_bands : c — spectral bands in HrMSI
    psf_size    : kernel size for the PSF
    scale_factor: spatial downsampling factor r
    """

    def __init__(
        self,
        lrHSI_bands: int,
        hrMSI_bands: int,
        psf_size: int = 8,
        scale_factor: int = 8,
    ):
        super().__init__()
        self.srf = SRFLayer(in_bands=lrHSI_bands, out_bands=hrMSI_bands)
        self.psf = PSFLayer(bands=hrMSI_bands, psf_size=psf_size, scale_factor=scale_factor)

    def forward(self, lrHSI: Tensor, hrMSI: Tensor) -> Tuple[Tensor, Tensor, Tensor]:
        """
        Parameters
        ----------
        lrHSI : (B, C, h, w)  — low-res hyperspectral image Y
        hrMSI : (B, c, H, W)  — high-res multispectral image Z

        Returns
        -------
        K1    : (B, c, h, w)  — spectrally degraded LrHSI (YS)
        K2    : (B, c, h, w)  — spatially degraded HrMSI (PZ)
        loss  : scalar L₁ degradation loss
        """
        K1 = self.srf(lrHSI)   # spectral degradation: LrHSI → LrMSI
        K2 = self.psf(hrMSI)   # spatial degradation:  HrMSI → LrMSI
        loss = F.l1_loss(K1, K2)
        return K1, K2, loss

    def degrade_hrHSI(self, hrHSI: Tensor, mode: str = "both") -> Tuple[Tensor, Tensor]:
        """
        Apply learned degradation to a candidate HrHSI for the DIG loss.

        Parameters
        ----------
        hrHSI : (B, C, H, W) candidate reconstruction
        mode  : 'spectral' | 'spatial' | 'both'

        Returns
        -------
        (spectrally_degraded, spatially_degraded) — compared to Z and Y resp.
        """
        spec = self.srf(hrHSI) if mode in ("spectral", "both") else None
        spat = self.psf(self.srf(hrHSI)) if mode == "both" else None
        if mode == "spatial":
            spat = self.psf(hrHSI)
        return spec, spat


# ─── SECTION 3: Interactive Spectral Learning Network ─────────────────────────

class Res2NetSpectralBlock(nn.Module):
    """
    Spectral upsampling block based on a simplified Res2Net module.

    In the paper, Res2Net modules with 1×1 kernels are used for spectral
    upsampling, focusing on cross-channel interactions rather than spatial ones.
    This implementation follows that design: all convolutions are 1×1 except
    for the final upsampling step.

    Parameters
    ----------
    in_ch  : input channel count
    out_ch : output channel count
    """

    def __init__(self, in_ch: int, out_ch: int):
        super().__init__()
        mid = max(1, (in_ch + out_ch) // 2)
        self.block = nn.Sequential(
            nn.Conv2d(in_ch, mid, kernel_size=1, bias=False),
            nn.BatchNorm2d(mid),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(mid, out_ch, kernel_size=1, bias=False),
            nn.BatchNorm2d(out_ch),
        )
        self.shortcut = nn.Conv2d(in_ch, out_ch, kernel_size=1, bias=False) if in_ch != out_ch else nn.Identity()
        self.act = nn.LeakyReLU(0.2, inplace=True)

    def forward(self, x: Tensor) -> Tensor:
        return self.act(self.block(x) + self.shortcut(x))


class InteractiveSpectralLearningNet(nn.Module):
    """
    Interactive Spectral Learning Network (ISLN) — ZSL stage, Step 2.

    Two-stream architecture that progressively upsamples spectral bands from
    LrMSI resolution (c bands) to LrHSI resolution (C bands), with cross-stream
    feature addition at each upsampling step to enable spectral interaction.

    Following Eqs. 10–12 of the paper:
      Y₁_out_i = Upsample_y_i(Y₁_out_{i-1} + Y₂_out_{i-1})  for i > 1
      Y₂_out_i = Upsample_z_i(Y₂_out_{i-1} + Y₁_out_{i-1})  for i > 1
      min ‖Y − Y₁‖₁ + ‖Y − Y₂‖₁

    Parameters
    ----------
    in_bands     : c — LrMSI spectral bands (input to both streams)
    target_bands : C — LrHSI spectral bands (reconstruction target)
    n_blocks     : number of upsampling stages
    """

    def __init__(self, in_bands: int, target_bands: int, n_blocks: int = 4):
        super().__init__()
        self.n_blocks = n_blocks

        # Compute per-block channel schedule: linearly interpolate in_bands → target_bands
        ch_schedule = [
            int(in_bands + (target_bands - in_bands) * (i + 1) / n_blocks)
            for i in range(n_blocks)
        ]
        ch_schedule[-1] = target_bands   # ensure exact match at last block
        in_schedule = [in_bands] + ch_schedule[:-1]

        # Stream 1 (K₁ branch) and Stream 2 (K₂ branch)
        self.stream1 = nn.ModuleList([
            Res2NetSpectralBlock(in_schedule[i], ch_schedule[i])
            for i in range(n_blocks)
        ])
        self.stream2 = nn.ModuleList([
            Res2NetSpectralBlock(in_schedule[i], ch_schedule[i])
            for i in range(n_blocks)
        ])

    def forward(self, K1: Tensor, K2: Tensor) -> Tuple[Tensor, Tensor]:
        """
        Parameters
        ----------
        K1 : (B, c, h, w) — spectrally degraded LrHSI (from SRF)
        K2 : (B, c, h, w) — spatially degraded HrMSI (from PSF)

        Returns
        -------
        Y1 : (B, C, h, w) — spectral upsampled estimate (stream 1)
        Y2 : (B, C, h, w) — spectral upsampled estimate (stream 2)
        """
        out1, out2 = K1, K2
        for i in range(self.n_blocks):
            if i == 0:
                out1 = self.stream1[i](out1)
                out2 = self.stream2[i](out2)
            else:
                # Cross-stream addition before each upsampling block (Eqs. 10-11)
                in1 = out1 + out2
                in2 = out2 + out1
                out1 = self.stream1[i](in1)
                out2 = self.stream2[i](in2)
        return out1, out2


# ─── SECTION 4: Double U-Shape Generator Network ─────────────────────────────

class SkipBlock(nn.Module):
    """
    Skip connection processing block (F^i_skip in Eq. 15).

    A 1×1 convolution that reduces spectral channels before skip concatenation.
    This reduces parameter burden and enhances feature representation compared
    to direct concatenation used in standard U-Net.
    """

    def __init__(self, in_ch: int, out_ch: int):
        super().__init__()
        self.conv = nn.Sequential(
            nn.Conv2d(in_ch, out_ch, kernel_size=1, bias=False),
            nn.BatchNorm2d(out_ch),
            nn.LeakyReLU(0.2, inplace=True),
        )

    def forward(self, x: Tensor) -> Tensor:
        return self.conv(x)


class EncoderBlock(nn.Module):
    """
    Single encoding block: 5×5 Conv → BN → LeakyReLU.
    Larger kernel (5×5) to capture broader spectral context.
    """

    def __init__(self, in_ch: int, out_ch: int):
        super().__init__()
        self.block = nn.Sequential(
            nn.Conv2d(in_ch, out_ch, kernel_size=5, padding=2, bias=False),
            nn.BatchNorm2d(out_ch),
            nn.LeakyReLU(0.2, inplace=True),
        )

    def forward(self, x: Tensor) -> Tensor:
        return self.block(x)


class DecoderBlock(nn.Module):
    """
    Single decoding block: Conv → BN → LeakyReLU → optional Sigmoid output.
    """

    def __init__(self, in_ch: int, out_ch: int, final: bool = False):
        super().__init__()
        layers = [
            nn.Conv2d(in_ch, out_ch, kernel_size=3, padding=1, bias=False),
            nn.BatchNorm2d(out_ch),
        ]
        layers.append(nn.Sigmoid() if final else nn.LeakyReLU(0.2, inplace=True))
        self.block = nn.Sequential(*layers)

    def forward(self, x: Tensor) -> Tensor:
        return self.block(x)


class UShapeStream(nn.Module):
    """
    Single U-shape encoder-decoder stream (one of two in EDIP-Net).

    Architecture (Section IV-C.1, Eqs. 14-15):
      Encoder: 3 blocks with average-pooling downsampling (×2 each)
      Decoder: 2 blocks with bilinear upsampling + skip connection
      Skip blocks: 1×1 conv applied before concatenation (F^i_skip)

    Parameters
    ----------
    in_bands  : C — input spectral bands (from ZSL coarse estimate)
    mid_ch    : internal feature channel count
    out_bands : C — output spectral bands (reconstructed HrHSI)
    """

    def __init__(self, in_bands: int, mid_ch: int = 128, out_bands: int = 46):
        super().__init__()

        # Encoder: three cascaded encoding blocks with downsampling
        self.enc1 = EncoderBlock(in_bands, mid_ch)          # no downsampling
        self.enc2 = EncoderBlock(mid_ch, mid_ch * 2)
        self.enc3 = EncoderBlock(mid_ch * 2, mid_ch * 4)
        self.pool = nn.AvgPool2d(kernel_size=2, stride=2)

        # Skip blocks (applied to encoder features before concatenation)
        self.skip2 = SkipBlock(mid_ch, mid_ch)
        self.skip1 = SkipBlock(mid_ch, mid_ch)

        # Decoder: two blocks with bilinear upsampling
        self.dec1 = DecoderBlock(mid_ch * 4 + mid_ch, mid_ch * 2)
        self.dec2 = DecoderBlock(mid_ch * 2 + mid_ch, mid_ch)

        # Final output: 1×1 conv to target band count with Sigmoid activation
        self.output_conv = DecoderBlock(mid_ch, out_bands, final=True)

    def forward(self, x: Tensor) -> Tensor:
        """
        x: (B, in_bands, H, W) coarse HrHSI estimate
        Returns: (B, out_bands, H, W) refined HrHSI candidate
        """
        # Encoder (Eq. 14)
        x1 = self.enc1(x)                              # (B, mid_ch, H, W)
        x2 = self.enc2(self.pool(x1))                  # (B, 2C, H/2, W/2)
        x3 = self.enc3(self.pool(x2))                  # (B, 4C, H/4, W/4)

        # Decoder with skip connections (Eq. 15)
        up1 = F.interpolate(x3, scale_factor=2, mode="bilinear", align_corners=True)
        d1 = self.dec1(torch.cat([self.skip2(x2), up1], dim=1))

        up2 = F.interpolate(d1, scale_factor=2, mode="bilinear", align_corners=True)
        d2 = self.dec2(torch.cat([self.skip1(x1), up2], dim=1))

        return self.output_conv(d2)                     # (B, out_bands, H, W)


class DoubleUShapeNet(nn.Module):
    """
    Double U-Shape Generator Network (Section IV-C.1, Eqs. 14-16).

    Two independent U-shape streams, one for each coarse estimate X₁ and X₂.
    Both streams share the same architecture but have separate parameters,
    allowing them to capture different aspects of the hyperspectral prior.

    Loss (Eq. 16): L₃ = ‖Y − PSF(X̂₁)‖₁ + ‖Z − SRF(X̂₁)‖₁
                       + ‖Y − PSF(X̂₂)‖₁ + ‖Z − SRF(X̂₂)‖₁

    Parameters
    ----------
    in_bands  : C — input/output spectral bands
    mid_ch    : U-Net internal feature channels
    """

    def __init__(self, in_bands: int = 46, mid_ch: int = 128):
        super().__init__()
        self.stream1 = UShapeStream(in_bands, mid_ch, in_bands)
        self.stream2 = UShapeStream(in_bands, mid_ch, in_bands)

    def forward(self, X1: Tensor, X2: Tensor) -> Tuple[Tensor, Tensor]:
        """
        X1, X2: (B, C, H, W) coarse HrHSI estimates from ZSL stage
        Returns: (X̂₁, X̂₂) refined HrHSI candidates
        """
        X_hat1 = self.stream1(X1)
        X_hat2 = self.stream2(X2)
        return X_hat1, X_hat2


# ─── SECTION 5: Degradation-Aware Decision Fusion ─────────────────────────────

def degradation_aware_fusion(
    X_hat1: Tensor,
    X_hat2: Tensor,
    hrMSI: Tensor,
    srf_layer: SRFLayer,
) -> Tensor:
    """
    Degradation-Aware Decision Fusion (Section IV-C.2, Eqs. 17-20).

    Selects the better candidate at each pixel based on which produces
    a smaller per-pixel RMSE when spectrally degraded and compared to HrMSI.

    Strategy:
      1. Spectrally degrade X̂₁ and X̂₂ → Z₁, Z₂ using learned SRF
      2. Compute pixel-wise RMSE maps M₁, M₂ against observed HrMSI Z
      3. Create binary selection mask B: B(i,j) = 1 if M₁ < M₂
      4. Fuse: X̂(i,j) = X̂₁(i,j) if B=1 else X̂₂(i,j)

    Parameters
    ----------
    X_hat1  : (B, C, H, W) — candidate 1 from stream 1
    X_hat2  : (B, C, H, W) — candidate 2 from stream 2
    hrMSI   : (B, c, H, W) — observed HrMSI Z (ground truth for comparison)
    srf_layer : trained SRFLayer (frozen)

    Returns
    -------
    X_fused : (B, C, H, W) — pixel-wise optimal fusion
    """
    with torch.no_grad():
        # Step 1: Spectrally degrade both candidates (Eq. 17)
        Z1 = srf_layer(X_hat1)   # (B, c, H, W)
        Z2 = srf_layer(X_hat2)   # (B, c, H, W)

        # Step 2: Pixel-wise RMSE maps (Eq. 18)
        # RMSE computed over spectral dimension → spatial error map
        M1 = ((Z1 - hrMSI) ** 2).mean(dim=1, keepdim=True).sqrt()  # (B, 1, H, W)
        M2 = ((Z2 - hrMSI) ** 2).mean(dim=1, keepdim=True).sqrt()  # (B, 1, H, W)

        # Step 3: Binary selection mask (Eq. 19)
        B_mask = (M1 < M2).float()   # (B, 1, H, W), 1 where candidate 1 is better

        # Step 4: Pixel-wise fusion (Eq. 20)
        X_fused = B_mask * X_hat1 + (1.0 - B_mask) * X_hat2

    return X_fused


# ─── SECTION 6: Full EDIP-Net Pipeline ───────────────────────────────────────

class EDIPNet(nn.Module):
    """
    EDIP-Net: Enhanced Deep Image Prior for Unsupervised HSI Super-Resolution.

    This class wraps all four components of the two-stage framework:
      ZSL Stage: DegradationLearningNet + InteractiveSpectralLearningNet
      DIG Stage: DoubleUShapeNet + degradation_aware_fusion (at inference)

    Training follows Algorithm 1:
      1. Optimise L₁ (DegradationLearningNet) to estimate PSF/SRF
      2. Optimise L₂ (ISLN) to generate coarse estimates X₁, X₂
      3. Generate X₁, X₂ by applying ISLN to (hrMSI, hrMSI)
      4. Optimise L₃ (DoubleUShapeNet) with frozen PSF/SRF
      5. Apply degradation-aware fusion at inference

    Parameters
    ----------
    config : EDIPConfig instance
    """

    def __init__(self, config: Optional[EDIPConfig] = None):
        super().__init__()
        cfg = config or EDIPConfig()
        self.cfg = cfg

        # Component 1: Degradation learning (ZSL stage, step 1)
        self.deg_net = DegradationLearningNet(
            lrHSI_bands=cfg.lrHSI_bands,
            hrMSI_bands=cfg.hrMSI_bands,
            psf_size=cfg.psf_size,
            scale_factor=cfg.scale_factor,
        )

        # Component 2: Interactive spectral learning (ZSL stage, step 2)
        self.spectral_net = InteractiveSpectralLearningNet(
            in_bands=cfg.hrMSI_bands,
            target_bands=cfg.lrHSI_bands,
            n_blocks=cfg.n_spectral_up,
        )

        # Component 3: Double U-shape generator (DIG stage)
        self.generator = DoubleUShapeNet(
            in_bands=cfg.lrHSI_bands,
            mid_ch=cfg.mid_ch,
        )

    def zsl_stage(
        self,
        lrHSI: Tensor,
        hrMSI: Tensor,
        device: torch.device,
        verbose: bool = False,
    ) -> Tuple[Tensor, Tensor]:
        """
        Execute the full ZSL stage (Algorithm 1, steps a–b).

        1. Trains degradation learning network to estimate PSF + SRF.
        2. Trains interactive spectral learning network on generated pairs.
        3. Generates coarse estimates X₁, X₂ from hrMSI.

        Returns
        -------
        X1, X2 : (1, C, H, W) scene-aware coarse HrHSI estimates
        """
        cfg = self.cfg

        # Step 1: Train degradation learning network
        opt_deg = torch.optim.Adam(
            list(self.deg_net.srf.parameters()) + list(self.deg_net.psf.parameters()),
            lr=cfg.lr_deg
        )
        sched_deg = torch.optim.lr_scheduler.LinearLR(
            opt_deg, start_factor=1.0, end_factor=0.0, total_iters=cfg.epochs_deg
        )
        self.deg_net.train()
        for ep in range(cfg.epochs_deg):
            opt_deg.zero_grad()
            _, _, loss_deg = self.deg_net(lrHSI, hrMSI)
            loss_deg.backward()
            opt_deg.step()
            sched_deg.step()
            if verbose and ep % 500 == 0:
                print(f"  [DEG] Epoch {ep}/{cfg.epochs_deg}  L1={loss_deg.item():.5f}")
        self.deg_net.eval()

        # Get training pairs from trained degradation network
        with torch.no_grad():
            K1, K2, _ = self.deg_net(lrHSI, hrMSI)

        # Step 2: Train interactive spectral learning network
        opt_spec = torch.optim.Adam(self.spectral_net.parameters(), lr=cfg.lr_spectral)
        sched_spec = torch.optim.lr_scheduler.LinearLR(
            opt_spec, start_factor=1.0, end_factor=0.0, total_iters=cfg.epochs_spec
        )
        self.spectral_net.train()
        for ep in range(cfg.epochs_spec):
            opt_spec.zero_grad()
            Y1_pred, Y2_pred = self.spectral_net(K1.detach(), K2.detach())
            loss_spec = F.l1_loss(Y1_pred, lrHSI) + F.l1_loss(Y2_pred, lrHSI)
            loss_spec.backward()
            opt_spec.step()
            sched_spec.step()
            if verbose and ep % 500 == 0:
                print(f"  [SPEC] Epoch {ep}/{cfg.epochs_spec}  L2={loss_spec.item():.5f}")
        self.spectral_net.eval()

        # Step 3: Apply spectral network to hrMSI to get coarse HrHSI estimates
        with torch.no_grad():
            X1, X2 = self.spectral_net(hrMSI, hrMSI)

        return X1, X2

    def dig_stage(
        self,
        X1: Tensor,
        X2: Tensor,
        lrHSI: Tensor,
        hrMSI: Tensor,
        verbose: bool = False,
    ) -> Tensor:
        """
        Execute the DIG stage (Algorithm 1, step a of DIG).

        Trains the double U-shape generator with frozen PSF/SRF parameters.
        Returns final fused HrHSI via degradation-aware decision fusion.

        Parameters
        ----------
        X1, X2 : (1, C, H, W) coarse estimates from ZSL stage
        lrHSI  : (1, C, h, w) observed LrHSI Y
        hrMSI  : (1, c, H, W) observed HrMSI Z

        Returns
        -------
        X_final : (1, C, H, W) final super-resolved HrHSI
        """
        cfg = self.cfg

        # Freeze PSF and SRF — use learned parameters for DIG loss only
        for p in self.deg_net.parameters():
            p.requires_grad = False

        opt_gen = torch.optim.Adam(self.generator.parameters(), lr=cfg.lr_gen)
        sched_gen = torch.optim.lr_scheduler.LinearLR(
            opt_gen, start_factor=1.0, end_factor=0.0, total_iters=cfg.epochs_gen
        )

        best_loss = float("inf")
        best_X1 = best_X2 = None

        self.generator.train()
        for ep in range(cfg.epochs_gen):
            opt_gen.zero_grad()
            X_hat1, X_hat2 = self.generator(X1.detach(), X2.detach())

            # L₃ = degradation consistency loss for both candidates (Eq. 16)
            # Spectral degradation: X̂ → HrMSI space (should match Z)
            Z_hat1 = self.deg_net.srf(X_hat1)
            Z_hat2 = self.deg_net.srf(X_hat2)

            # Spatial degradation: X̂ → LrHSI space (should match Y)
            # For spatial: need to downsample from HrHSI to LrHSI resolution
            Y_hat1 = F.avg_pool2d(X_hat1, kernel_size=cfg.scale_factor, stride=cfg.scale_factor)
            Y_hat2 = F.avg_pool2d(X_hat2, kernel_size=cfg.scale_factor, stride=cfg.scale_factor)

            loss_gen = (
                F.l1_loss(Z_hat1, hrMSI) + F.l1_loss(Y_hat1, lrHSI) +
                F.l1_loss(Z_hat2, hrMSI) + F.l1_loss(Y_hat2, lrHSI)
            )
            loss_gen.backward()
            opt_gen.step()
            sched_gen.step()

            if loss_gen.item() < best_loss:
                best_loss = loss_gen.item()
                best_X1 = X_hat1.detach().clone()
                best_X2 = X_hat2.detach().clone()

            if verbose and ep % 1000 == 0:
                print(f"  [GEN] Epoch {ep}/{cfg.epochs_gen}  L3={loss_gen.item():.5f}")

        # Apply degradation-aware decision fusion (Eqs. 17-20)
        X_final = degradation_aware_fusion(
            best_X1, best_X2, hrMSI, self.deg_net.srf
        )
        return X_final

    def reconstruct(
        self,
        lrHSI: Tensor,
        hrMSI: Tensor,
        device: torch.device = torch.device("cpu"),
        verbose: bool = True,
    ) -> Tensor:
        """
        Full EDIP-Net inference pipeline (Algorithm 1).

        Parameters
        ----------
        lrHSI  : (1, C, h, w) observed LrHSI Y
        hrMSI  : (1, c, H, W) observed HrMSI Z
        device : compute device

        Returns
        -------
        X_final : (1, C, H, W) reconstructed HrHSI
        """
        lrHSI = lrHSI.to(device)
        hrMSI = hrMSI.to(device)

        if verbose: print("\n── ZSL Stage: generating coarse estimates ──")
        X1, X2 = self.zsl_stage(lrHSI, hrMSI, device, verbose)

        if verbose: print("\n── DIG Stage: prior learning & fusion ──")
        X_final = self.dig_stage(X1, X2, lrHSI, hrMSI, verbose)

        return X_final


# ─── SECTION 7: Training Loop ─────────────────────────────────────────────────

def run_edip_reconstruction(
    lrHSI: Tensor,
    hrMSI: Tensor,
    config: Optional[EDIPConfig] = None,
    device_str: str = "cpu",
    verbose: bool = True,
) -> Tensor:
    """
    Convenience wrapper for end-to-end EDIP-Net reconstruction.

    For real use, replace the dummy dataset call below with:
      lrHSI = load_hyperspectral_image(path)    # (1, C, h, w)
      hrMSI  = load_multispectral_image(path)   # (1, c, H, W)

    Default config uses short epoch counts for fast smoke testing.
    For full paper-matching performance, use:
      EDIPConfig(epochs_deg=2000, epochs_spec=2000, epochs_gen=7000)
    """
    device = torch.device(device_str)
    cfg = config or EDIPConfig()

    model = EDIPNet(cfg).to(device)
    X_final = model.reconstruct(lrHSI, hrMSI, device=device, verbose=verbose)

    if verbose:
        print(f"\n✓ Reconstruction complete. Output shape: {tuple(X_final.shape)}")
    return X_final


# ─── SECTION 8: Evaluation Metrics ───────────────────────────────────────────

def compute_psnr(pred: Tensor, target: Tensor, data_range: float = 1.0) -> float:
    """Peak Signal-to-Noise Ratio."""
    mse = F.mse_loss(pred, target)
    if mse == 0:
        return float("inf")
    return 10 * math.log10(data_range ** 2 / mse.item())


def compute_sam(pred: Tensor, target: Tensor, eps: float = 1e-8) -> float:
    """
    Spectral Angle Mapper (SAM) — mean angle in degrees between spectral vectors.

    pred, target: (B, C, H, W) or (C, H, W)
    Returns: mean SAM in degrees (lower is better)
    """
    if pred.dim() == 3:
        pred = pred.unsqueeze(0)
        target = target.unsqueeze(0)
    # (B, C, H*W) → dot product over spectral dim
    p = pred.reshape(pred.shape[0], pred.shape[1], -1)
    t = target.reshape(target.shape[0], target.shape[1], -1)
    dot = (p * t).sum(dim=1)                          # (B, H*W)
    norm_p = p.norm(dim=1).clamp(min=eps)
    norm_t = t.norm(dim=1).clamp(min=eps)
    cos_sim = (dot / (norm_p * norm_t)).clamp(-1 + eps, 1 - eps)
    angles = torch.acos(cos_sim) * 180 / math.pi     # degrees
    return angles.mean().item()


def compute_rmse(pred: Tensor, target: Tensor) -> float:
    """Root Mean Square Error."""
    return torch.sqrt(F.mse_loss(pred, target)).item()


def compute_ergas(
    pred: Tensor, target: Tensor, scale: int = 8
) -> float:
    """
    Relative Dimensionless Global Error in Synthesis (ERGAS).

    ERGAS = 100/r * sqrt(1/C * Σ_k (RMSE_k / mean_k)²)
    where r is the scale factor, C is the number of bands.
    """
    B, C, H, W = pred.shape
    ergas_sum = 0.0
    for c in range(C):
        rmse_c = torch.sqrt(F.mse_loss(pred[:, c], target[:, c])).item()
        mean_c = target[:, c].mean().item()
        if mean_c > 1e-8:
            ergas_sum += (rmse_c / mean_c) ** 2
    return (100 / scale) * math.sqrt(ergas_sum / C)


def evaluate_all(pred: Tensor, target: Tensor, scale: int = 8) -> Dict[str, float]:
    """Compute all six HISR metrics used in the paper."""
    return {
        "PSNR":  compute_psnr(pred, target),
        "SAM":   compute_sam(pred, target),
        "RMSE":  compute_rmse(pred, target),
        "ERGAS": compute_ergas(pred, target, scale),
        "CC":    torch.corrcoef(torch.stack([pred.flatten(), target.flatten()]))[0, 1].item(),
    }


# ─── SECTION 9: Dataset Helpers ──────────────────────────────────────────────

def make_dummy_hisr_pair(
    H: int = 64,
    W: int = 64,
    C: int = 46,
    c: int = 8,
    scale: int = 8,
    device: torch.device = torch.device("cpu"),
) -> Tuple[Tensor, Tensor]:
    """
    Generate a synthetic HrMSI-LrHSI pair from a random HrHSI for testing.

    Returns
    -------
    lrHSI : (1, C, H//scale, W//scale) — synthetic low-res hyperspectral
    hrMSI : (1, c, H, W)               — synthetic high-res multispectral
    """
    h, w = H // scale, W // scale
    # Synthetic HrHSI ground truth (values in [0, 1])
    hrHSI = torch.rand(1, C, H, W, device=device)

    # Simulate spatial degradation: blur + downsample
    lrHSI = F.avg_pool2d(hrHSI, kernel_size=scale, stride=scale)  # (1, C, h, w)
    lrHSI = lrHSI + 0.01 * torch.randn_like(lrHSI)

    # Simulate spectral degradation: random SRF projection
    srf = F.softmax(torch.randn(c, C, device=device), dim=1)
    hrMSI = torch.einsum('bchw,mc->bmhw', hrHSI, srf)
    hrMSI = hrMSI + 0.01 * torch.randn_like(hrMSI)

    return lrHSI, hrMSI


def load_houston_pair(data_dir: str) -> Tuple[Tensor, Tensor]:
    """
    Placeholder loader for the Houston hyperspectral dataset.

    Dataset: https://hyperspectral.ee.uh.edu/
    Specs:   HrHSI 400×400×46, scale factor 8, WorldView-2 SRF (8 bands)

    Replace the body of this function with your actual data loading code.
    Expected format: HrHSI numpy array of shape (H, W, C), values in [0, 1].
    """
    raise NotImplementedError(
        "Download the Houston dataset from https://hyperspectral.ee.uh.edu/ "
        "and implement this loader to return (lrHSI, hrMSI) tensors."
    )


def load_washington_pair(data_dir: str) -> Tuple[Tensor, Tensor]:
    """
    Placeholder loader for the Washington DC Mall dataset.
    https://engineering.purdue.edu/~biehl/MultiSpec/hyperspectral.html
    HrHSI 300×300×191, scale factor 10, WorldView-2 SRF (8 bands)
    """
    raise NotImplementedError("Implement Washington DC Mall data loader.")


def load_tiangong_pair(data_dir: str) -> Tuple[Tensor, Tensor]:
    """
    Placeholder loader for the TianGong-1 dataset.
    https://www.msadc.cn/dataHome
    HrHSI 240×240×54, scale factor 12, WorldView-2 SRF (8 bands)
    """
    raise NotImplementedError("Implement TianGong-1 data loader.")


def load_chikusei_pair(data_dir: str) -> Tuple[Tensor, Tensor]:
    """
    Placeholder loader for the Chikusei dataset.
    http://naotoyokoya.com/Download.html
    HrHSI 400×400×110, scale factor 16, WorldView-2 SRF (8 bands)
    """
    raise NotImplementedError("Implement Chikusei data loader.")


# ─── SECTION 10: Smoke Test ───────────────────────────────────────────────────

if __name__ == "__main__":
    print("=" * 60)
    print("EDIP-Net — Full Architecture Smoke Test")
    print("=" * 60)
    torch.manual_seed(42)
    device = torch.device("cpu")

    # Small dimensions for fast smoke testing
    H, W, C, c, scale = 32, 32, 16, 4, 4

    # ── 1. Generate synthetic HrMSI-LrHSI pair ─────────────────────────────
    print("\n[1/5] Generating synthetic HSI pair...")
    lrHSI, hrMSI = make_dummy_hisr_pair(H=H, W=W, C=C, c=c, scale=scale, device=device)
    print(f"  lrHSI: {tuple(lrHSI.shape)}  hrMSI: {tuple(hrMSI.shape)}")

    # ── 2. Test degradation learning network ───────────────────────────────
    print("\n[2/5] Degradation learning network forward pass...")
    cfg = EDIPConfig(
        lrHSI_bands=C, hrMSI_bands=c, scale_factor=scale, psf_size=4,
        epochs_deg=3, epochs_spec=3, epochs_gen=3,  # tiny for smoke test
        mid_ch=16, n_spectral_up=2,
    )
    model = EDIPNet(cfg).to(device)
    K1, K2, loss_deg = model.deg_net(lrHSI, hrMSI)
    print(f"  K1: {tuple(K1.shape)}  K2: {tuple(K2.shape)}  L1={loss_deg.item():.5f}")

    # ── 3. Test interactive spectral learning network ──────────────────────
    print("\n[3/5] Interactive spectral learning network...")
    Y1, Y2 = model.spectral_net(K1.detach(), K2.detach())
    print(f"  Y1: {tuple(Y1.shape)}  Y2: {tuple(Y2.shape)}")
    assert Y1.shape == lrHSI.shape, f"Y1 shape mismatch: {Y1.shape} vs {lrHSI.shape}"

    # ── 4. Test double U-shape generator ──────────────────────────────────
    print("\n[4/5] Double U-shape generator forward pass...")
    X1_coarse = Y1.detach()
    X2_coarse = Y2.detach()
    # Upsample coarse estimates to HrHSI resolution for generator input
    X1_hr = F.interpolate(X1_coarse, size=(H, W), mode="bilinear", align_corners=True)
    X2_hr = F.interpolate(X2_coarse, size=(H, W), mode="bilinear", align_corners=True)
    X_hat1, X_hat2 = model.generator(X1_hr, X2_hr)
    print(f"  X̂₁: {tuple(X_hat1.shape)}  X̂₂: {tuple(X_hat2.shape)}")
    assert X_hat1.shape == (1, C, H, W)

    # ── 5. Test fusion and metrics ─────────────────────────────────────────
    print("\n[5/5] Degradation-aware fusion + metrics...")
    X_fused = degradation_aware_fusion(
        X_hat1.detach(), X_hat2.detach(), hrMSI, model.deg_net.srf
    )
    print(f"  X_fused: {tuple(X_fused.shape)}")

    # Compute metrics against a random "ground truth" for shape validation
    gt = torch.rand_like(X_fused)
    metrics = evaluate_all(X_fused, gt, scale=scale)
    print(f"  Metrics: PSNR={metrics['PSNR']:.2f} dB | SAM={metrics['SAM']:.4f} | RMSE={metrics['RMSE']:.4f}")

    print("\n" + "=" * 60)
    print("✓  All checks passed. EDIP-Net is ready for use.")
    print("=" * 60)
    print("""
Next steps:
  1. Download datasets:
       Houston        : https://hyperspectral.ee.uh.edu/
       Washington DC  : https://engineering.purdue.edu/~biehl/MultiSpec/
       TianGong-1     : https://www.msadc.cn/dataHome
       Chikusei       : http://naotoyokoya.com/Download.html
       Liao Ning-01   : https://drive.google.com/drive/folders/1JLCCB6ld5R49HDLN5SsMISx1d0fuqRjO

  2. Implement the data loaders in Section 9 (load_houston_pair etc.)

  3. Run full reconstruction with paper's epoch settings:
       cfg = EDIPConfig(epochs_deg=2000, epochs_spec=2000, epochs_gen=7000)
       X_final = run_edip_reconstruction(lrHSI, hrMSI, config=cfg, device_str='cuda')

  4. Official code: https://github.com/JiaxinLiCAS
""")

Read the Full Paper & Access the Code

The complete study — including full per-dataset quantitative tables, spectral band PSNR curves, real-world Liao Ning-01 satellite results, and scale factor robustness analysis — is available on IEEE Xplore. Official code is on GitHub.

📄 Read the Paper (IEEE TGRS) 💻 Official Code (GitHub)

Academic Citation:
Li, J., Zheng, K., Gao, L., Han, Z., Li, Z., & Chanussot, J. (2025). Enhanced deep image prior for unsupervised hyperspectral image super-resolution. IEEE Transactions on Geoscience and Remote Sensing, 63, 5504218. https://doi.org/10.1109/TGRS.2025.3531646

This article is an independent editorial analysis of peer-reviewed research. The PyTorch implementation is an educational adaptation; the original authors used an Adam optimizer with linear decay on all three subnets. For exact reproduction refer to the official code at https://github.com/JiaxinLiCAS. All benchmark datasets are publicly available at the links provided in Section 9.

Explore More on AI Trend Blend

If this article sparked your interest, here is more of what we cover across the site — from agricultural AI and precision farming to adversarial robustness, computer vision, and efficient model design.

HomepageLatest AI research breakdowns and trending papers

→

Machine LearningArchitecture reviews, training techniques, benchmarks

→

Computer VisionObject detection, segmentation, 3D understanding

→

AI SecurityAdversarial attacks, robustness, federated defences

→

Agricultural AIPrecision farming, plant phenotyping, crop science

→

AboutWho we are and how we cover AI research

→