GEF: Gaussian Entropy Fields for 3D Surface Reconstruction | AI Trend Blend

3D Computer Vision · ISPRS Journal of Photogrammetry and Remote Sensing 236 (2026) 273–285 · 24 min read

GEF: What If the Secret to Better 3D Reconstruction Was Treating Surface Uncertainty as Entropy?

Researchers at Shandong University of Science and Technology stopped asking Gaussian primitives to be geometrically correct and instead asked them to be informationally minimal. Their Gaussian Entropy Fields framework achieves the best Chamfer Distance among all Gaussian Splatting methods on DTU and the top SSIM and LPIPS on Mip-NeRF 360 — without ever explicitly telling a primitive what a surface is.

GEF Gaussian Entropy Fields 3D Gaussian Splatting Surface Reconstruction Entropy Regularization SNRI Novel View Synthesis DTU Dataset Mip-NeRF 360 UAV Photogrammetry

3D Gaussian Splatting arrived in 2023 and immediately redrew the benchmark charts for real-time novel view synthesis. But it carried a structural flaw: Gaussian primitives optimize toward photometric accuracy, not geometric correctness. Nothing in the base formulation stops thousands of semi-transparent splats from stacking on top of each other along a viewing ray, each contributing a little opacity, none committing to a definite surface. The resulting 3D reconstruction looks fine in rendered images but falls apart when you ask it to produce a mesh. Gaussian Entropy Fields (GEF) from Hong Kuang and Jianchen Liu at Shandong USTA reframes this as an information-theoretic problem: a well-reconstructed surface is one whose primitive distribution has low entropy. You stop telling Gaussians where to go and start penalizing them for being uncertain about where they are.

The Root Cause of 3DGS Surface Failures

To understand why GEF works, you first need to understand exactly what goes wrong in vanilla 3DGS surface reconstruction. During alpha-compositing rendering, a pixel’s color is computed by summing the contributions of all Gaussians sorted by depth: each primitive contributes its color, weighted by its opacity and the cumulative transparency of everything in front of it. The photometric loss gradient can be satisfied by many different arrangements of Gaussians — a single opaque Gaussian on the surface, two semi-transparent Gaussians slightly offset from each other, five low-opacity Gaussians spread across half a meter of depth. All of these can produce nearly identical rendered images. The optimizer has no reason to prefer the compact single-surface arrangement over the diffuse multi-Gaussian blob.

This is the geometric ambiguity problem, and it manifests visually as primitive aliasing: elongated Gaussian splats that stretch along viewing directions, clusters of redundant primitives floating slightly off a surface, and fuzzy meshes when depth maps are fused via TSDF. Prior work attacked this by flattening Gaussians into 2D disks (2DGS), adding normal consistency losses (PGSR), or introducing signed distance field guidance. GEF takes a different angle entirely: instead of specifying what a surface should look like geometrically, it specifies what a primitive distribution should look like informationally.

The Central Insight

When Gaussian primitives are well-aligned on a surface, the accumulated opacity along any viewing ray through that surface forms a sharp, concentrated spike — like a Dirac delta. When primitives are stacked redundantly, the accumulated opacity forms a broad, multi-peaked distribution. The Shannon entropy of the normalized opacity distribution is therefore a direct, computationally tractable proxy for surface ambiguity. Minimizing local opacity entropy is equivalent to promoting surface determinism.

GEF Architecture — Three Complementary Components

BASE: 3D Gaussian Splatting pipeline (position μ, covariance Σ, color c, opacity α)
      │
      ├─── Standard photometric loss (L1 + SSIM on rendered vs. ground-truth RGB)
      │
┌─────▼──────────────────────────────────────────────────────────────────────────┐
│  COMPONENT 1: ENTROPY-REGULARIZED SPARSITY CONSTRAINT (Section 3.2)            │
│                                                                                  │
│  Opacity Field Along Ray:                                                        │
│    α(t) = Σ_i  w_i · N(t | μ_i, σ²_i)    [mixture of Gaussians on ray]         │
│    ã(t) = α(t) / ‖α‖₁                    [normalize to pseudo-distribution]    │
│    H(α) = -∫ ã(t) log ã(t) dt            [differential entropy = ambiguity]    │
│                                                                                  │
│  3D Neighborhood Formulation (1280× faster than per-ray):                       │
│    Ωk = K-nearest neighbors of primitive k  (K=50, fixed at iter 20,000)        │
│                                                                                  │
│  SNRI (Surface Neighborhood Redundancy Index):                                   │
│    SNRI(i) = (1/K) Σ_j exp(-α[ε - |n_i · (μ_j - μ_i)|]+)                      │
│      → Low SNRI ≈ 0.1 (primitives spread tangentially — keep them)              │
│      → High SNRI ≈ 0.9 (primitives stacked along normal — sparsify them)        │
│                                                                                  │
│  Image Entropy-Guided Weight:                                                    │
│    E_img(x,y) = -Σ p(u,v) log₂ p(u,v)   [sliding window k×k]                  │
│    E_img_k = Σ α_k(x,y)·E_img(x,y) / Σ α_k(x,y)  [primitive-level entropy]    │
│    w_img_k = 1 - (E_img_k - E_min)/(E_max - E_min)  [inverse: complex→relax]   │
│                                                                                  │
│  Sparsity Loss:                                                                  │
│    L_sparsity = Σ_k  w_img_k · max(0, H(α_k) - η_k)²                           │
│    η_k = ½ ln(2πe σ²_min) + ε + β·SNRI_k   [adaptive entropy threshold]        │
└─────┬──────────────────────────────────────────────────────────────────────────┘
      │
┌─────▼──────────────────────────────────────────────────────────────────────────┐
│  COMPONENT 2: MULTI-SCALE ENTROPY ALIGNMENT (Section 3.3)                       │
│                                                                                  │
│  Multi-Scale Image Entropy Maps:                                                 │
│    Scale l=1: window k_1 = 3×3   (fine: textures, sharp edges)                  │
│    Scale l=2: window k_2 = 5×5   (medium: intermediate structure)               │
│    Scale l=3: window k_3 = 9×9   (coarse: smooth regions, backgrounds)          │
│                                                                                  │
│  Competitive Cross-Scale Softmax Weighting (β=6.0):                             │
│    W_l(x,y) = exp(β·E^l_norm) / Σ_j exp(β·E^j_norm)                            │
│    → Each spatial location "won" by its most informative scale                  │
│    → >85% of pixels have max weight > 0.6 (strong spatial specialization)       │
│                                                                                  │
│  Alignment Loss:                                                                 │
│    L_align = Σ_l λ_l Σ_i  W^l_i · |H^l_i - E^l_i|²                            │
│    (H = rendered opacity entropy, E = image entropy target)                     │
│                                                                                  │
│  Delayed Entropy Updates (efficiency):                                           │
│    θ̇ = -∇_θ L(θ, H_cached)   [optimize under fixed entropy]                    │
│    H_cached updated every N=10-20 iterations  [3-4× speedup]                    │
└─────┬──────────────────────────────────────────────────────────────────────────┘
      │
┌─────▼──────────────────────────────────────────────────────────────────────────┐
│  COMPONENT 3: GEOMETRIC REGULARIZATION (Section 3.4)                            │
│                                                                                  │
│  Depth Consistency Loss (from 2DGS):                                             │
│    L_depth = Σ_{i,j∈r} ω_i ω_j ‖d_i - d_j‖²₂                                  │
│                                                                                  │
│  Normal Consistency Loss:                                                        │
│    L_normal = Σ_{i∈r} ω_i (1 - n_i^⊤ n')                                       │
│    (n' = surface normal from finite differences on depth map)                    │
│                                                                                  │
│  TOTAL LOSS:                                                                     │
│    L = L_photometric + λ_s·L_sparsity + λ_a·L_align + L_depth + L_normal       │
│                                                                                  │
│  Training schedule:                                                              │
│    iter 0–15,000:  photometric only (standard 3DGS)                             │
│    iter 15,000+:   + geometric regularization                                    │
│    iter 20,000+:   + entropy field losses (k-NN graph fixed here)               │
│    iter 30,000:    done → TSDF fusion → Marching Cubes mesh                     │
└─────────────────────────────────────────────────────────────────────────────────┘

Component 1: Entropy-Regularized Sparsity — The Heart of GEF

From Ray Entropy to Neighborhood Entropy

The paper starts with a beautifully intuitive formulation: normalize the accumulated opacity along a viewing ray into a pseudo-probability distribution, then compute its differential entropy. High entropy means the opacity mass is spread across many depth positions — multiple competing Gaussian clusters, no confident surface. Low entropy means the opacity is concentrated at a single depth — one dominant primitive, a clear surface. Minimizing this entropy over all rays would give you geometrically consistent surfaces as an emergent property of information-theoretic optimization.

The problem is scale. With roughly a million rays per training iteration and 128 samples per ray, direct ray-based entropy computation requires 6.4 billion operations per iteration. That’s not a practical training budget. The key observation that rescues the idea is that ray-based opacity disorder and 3D neighborhood opacity disorder are the same phenomenon viewed from different perspectives. Primitives that stack along viewing directions (creating multi-peaked ray opacity profiles) are exactly the primitives that cluster in the normal direction within 3D spatial neighborhoods. So instead of sampling each ray, GEF partitions the scene into overlapping K-nearest-neighbor neighborhoods around each primitive and computes entropy of the normalized opacity histogram within each neighborhood. Same redundancy, same sparsity signal — but only 5 million operations instead of 6.4 billion, a 1,280× speedup.

SNRI: Not All Stacked Primitives Are Redundant

The naive version of neighborhood entropy would penalize any cluster of primitives, including tangentially-distributed primitives that legitimately cover a curved surface from different angles. GEF introduces the Surface Neighborhood Redundancy Index to distinguish these two cases.

Eq. 6 — SNRI $$\text{SNRI}(i) = \frac{1}{K}\sum_{j=1}^{K} e^{-\alpha[\epsilon – |\mathbf{n}_i \cdot ({\mu}_j – {\mu}_i)|]_+}$$

For each primitive $i$, SNRI computes how far each neighbor $j$ deviates from the surface plane defined by $i$’s normal $\mathbf{n}_i$. The projected distance $d_j = |\mathbf{n}_i \cdot ({\mu}_j – {\mu}_i)|$ measures normal-direction displacement. If $d_j$ is small (neighbor lies in the tangent plane), that neighbor contributes a low value to SNRI — it’s a legitimate surface coverage. If $d_j$ is large (neighbor is stacked along the normal), it contributes close to 1 — it’s a true redundancy candidate. Averaging over $K$ neighbors gives a score in $[0,1]$: near 0 for tangentially-spread primitives, near 1 for normal-direction stacks.

SNRI feeds directly into the adaptive entropy threshold $\eta_k$ — the maximum entropy a neighborhood is allowed to have before the loss kicks in. High-SNRI neighborhoods get a lower threshold (they’re penalized earlier for having any entropy), while low-SNRI neighborhoods get a higher threshold (they’re given more slack because some complexity there is legitimate). The coupling weight $\beta = -\sigma^2_{\min}$ ensures this scaling is normalized relative to primitive scale.

Image Entropy-Guided Weighting: Letting the Input Images Set the Rules

One fundamental tension in geometric regularization is between smoothness and detail. A flat concrete wall should be aggressively sparsified — you want one clean surface, no noise. A bicycle wheel with thin spokes needs flexibility — too much sparsification will erase the spokes entirely. Prior methods set this tradeoff by hand with global hyperparameters. GEF reads it from the input images themselves.

Eq. 7–9 — Image Entropy Weighting $$E_\text{img}(x,y) = -\sum_{(u,v)\in\Omega_{k\times k}(x,y)} p(u,v)\log_2 p(u,v)$$ $$E_{\text{img}_k} = \frac{\sum_{(x,y)} \alpha_k(x,y)\cdot E_{\text{img}}(x,y)}{\sum_{(x,y)}\alpha_k(x,y)}$$ $$w_{\text{img}_k} = 1 – \frac{E_{\text{img}_k} – E_{\min}}{E_{\max} – E_{\min}}$$

High image entropy at a pixel means that region is visually complex — lots of edges, fine texture, rich structure. Low image entropy means visual simplicity — smooth gradients, uniform color. After projecting each primitive’s image entropy (a weighted average over its influenced pixels), the inverse normalization in Eq. 9 maps high-entropy regions to low weights in the sparsity loss. The optimizer therefore applies less sparsification pressure where the images indicate complexity, and more where they indicate simplicity. The entire parameter tuning burden — how aggressively to sparsify different scene regions — is absorbed by the image data.

Component 2: Multi-Scale Entropy Alignment — Preventing Over-Smoothing at Boundaries

Even a well-calibrated sparsity loss can introduce excessive smoothing at geometric discontinuities: sharp edges, thin structures, surfaces with rapid curvature changes. GEF’s second component addresses this by enforcing cross-modal entropy distribution matching — the rendered 3D model’s opacity entropy profile should resemble the 2D image entropy profile at corresponding locations, across multiple spatial scales simultaneously.

Three scales of image entropy maps are computed: a fine scale (3×3 window) that captures textures and object boundaries, a medium scale (5×5 window) for intermediate structural features, and a coarse scale (9×9 window) for smooth regions and backgrounds. Rather than blending these uniformly, GEF uses a competitive softmax weighting with temperature parameter $\beta = 6.0$:

Eq. 13–14 — Competitive Cross-Scale Weighting $$W_l(x,y) = \frac{\exp(\beta \cdot E^l_\text{norm}(x,y))}{\sum_{j=1}^{L}\exp(\beta \cdot E^j_\text{norm}(x,y))}$$ $$\mathcal{L}_\text{align} = \sum_{l=1}^{L}\lambda_l \sum_{i=1}^{N} W^l_i \bigl|H^l_i – E^l_i\bigr|^2$$

The competitive mechanism means each spatial location gets dominated by one scale — the scale at which that location’s structural information is most concentrated. The paper validates this empirically: over 85% of pixels have a maximum weight above 0.6, confirming strong spatial specialization rather than diffuse blending. Fine scales win in textured regions and along sharp edges, preventing entropy minimization from flattening them. Coarse scales win in smooth regions, ensuring aggressive sparsification there. The temperature $\beta = 6.0$ is the sweet spot between insufficient competition (scales blur together) and training instability (gradients become too sharp).

A practical efficiency trick makes this viable in training: entropy maps are computationally expensive to recompute every iteration. GEF exploits the fact that configurational entropy evolves much slower than individual parameter values — during a short training window, the geometry barely moves even though the photometric loss drives rapid color changes. Entropy maps are therefore cached and updated only every N=10–20 iterations, achieving a 3–4× training speedup with less than 1% quality degradation.

“Well-reconstructed surfaces can be characterized by low configurational entropy, where dominant primitives clearly define surface geometry while redundant components are suppressed.” — Kuang & Liu, ISPRS Journal of Photogrammetry and Remote Sensing, 2026

Results: What Entropy-Driven Optimization Actually Achieves

Mip-NeRF 360 — Rendering Quality

Method	Indoor SSIM ↑	Indoor LPIPS ↓	Outdoor SSIM ↑	Outdoor LPIPS ↓	Avg SSIM ↑	Avg LPIPS ↓
3DGS	0.926	0.199	0.705	0.283	0.803	0.246
2DGS	0.923	0.183	0.709	0.248	0.804	0.239
GOF	0.928	0.167	0.742	0.225	0.835	0.196
GEF (ours)	0.944	0.084	0.783	0.177	0.855	0.136

DTU Surface Reconstruction — Chamfer Distance

Method	Type	Mean CD ↓	Training Time
NeRF	Implicit	1.49	>12 h
NeuS	Implicit	0.84	>12 h
Neuralangelo	Implicit	0.61	>12 h
3DGS	Explicit	1.96	5.2 min
2DGS	Explicit	0.87	8.9 min
GOF	Explicit	0.82	55 min
GEF (ours)	Explicit	0.64	23 min

Tanks and Temples — F1 Score

Scene	NeuS	2DGS	GOF	GEF
Barn	0.29	0.45	0.51	0.48
Ignatius	0.83	0.50	0.68	0.75
Truck	0.45	0.48	0.58	0.56
Mean	0.38	0.32	0.46	0.44
Time	>24 h	15.5 min	>1 h	50 min

A few things stand out in these numbers. On Mip-NeRF 360, GEF is not the best in PSNR (it ranks third at 27.40) — the entropy computation introduces a small amount of noise that slightly hurts pixel-wise accuracy. But it dominates in SSIM and LPIPS by a considerable margin, which means the rendered images are perceptually superior: sharper structural detail, better preservation of fine-grained texture, fewer ghosting artifacts. The distinction between pixel accuracy and perceptual quality is meaningful, and GEF firmly prioritizes the latter.

On DTU Chamfer Distance, GEF achieves 0.64 — best among all Gaussian-Splatting-based methods and competitive with Neuralangelo (0.61), which requires over 12 hours of training. GEF trains in 23 minutes. That’s the efficiency argument for entropy-based sparsification over implicit neural surface methods.

Ablation: Every Component Earns Its Place

Setting	F1 ↑	PSNR ↑	SSIM ↑	LPIPS ↓
w/o Sparsity Constraint	0.37	23.51	0.850	0.196
w/o Image Entropy Guidance	0.40	24.62	0.865	0.170
w/o Multi-Scale Alignment	0.42	25.22	0.872	0.157
Full GEF model	0.44	25.89	0.870	0.150

The sparsity constraint removal causes the largest F1 drop (0.44 → 0.37, a 15.9% collapse), confirming it is the foundation of geometric quality. Removing image entropy guidance drops F1 by 9.1% — uniform sparsification breaks spatial coherence across complex regions. Multi-scale alignment removal costs 4.5% in F1 — cross-scale entropy matching prevents the smoothing that would otherwise erase fine geometric detail. The gains are synergistic rather than additive, meaning each component addresses a failure mode the others cannot compensate for.

Complete End-to-End GEF Implementation (PyTorch)

The implementation covers all components from the paper in 12 sections: 3D Gaussian primitive representation and differentiable alpha-blending, the SNRI (Surface Neighborhood Redundancy Index) computation, image entropy-guided weighting, the entropy-regularized sparsity loss, multi-scale entropy alignment with competitive softmax weighting, delayed entropy caching for training efficiency, depth and normal consistency losses, the complete GEF loss function, mesh extraction via TSDF fusion and Marching Cubes, the full GEF trainer, hyperparameter sensitivity utilities, and a smoke test.

# ==============================================================================
# GEF: Gaussian Entropy Fields — Driving Adaptive Sparsity in 3D Gaussian Optimization
# Paper: ISPRS Journal of Photogrammetry and Remote Sensing 236 (2026) 273-285
# Authors: Hong Kuang, Jianchen Liu
# Affiliation: Shandong University of Science and Technology, China
# DOI: https://doi.org/10.1016/j.isprsjprs.2026.04.010
# ==============================================================================
# Sections:
#   1.  Imports & Configuration
#   2.  3D Gaussian Primitive Representation (Eq. 1-3)
#   3.  Differentiable Alpha-Blending Renderer
#   4.  SNRI: Surface Neighborhood Redundancy Index (Eq. 6)
#   5.  Image Entropy Computation (Eq. 7-9)
#   6.  Opacity Entropy in 3D Neighborhoods (Eq. 4-5, 10-11)
#   7.  Multi-Scale Entropy Alignment (Eq. 13-16)
#   8.  Delayed Entropy Cache (Eq. 15-16)
#   9.  Depth and Normal Consistency Losses (Eq. 12, 17)
#  10.  Complete GEF Loss Function (Eq. 18-19)
#  11.  GEF Trainer + Optimization Schedule
#  12.  Mesh Extraction (TSDF + Marching Cubes) & Smoke Test
# ==============================================================================

from __future__ import annotations
import math, warnings
from typing import Dict, List, Optional, Tuple
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch import Tensor
import numpy as np
warnings.filterwarnings("ignore")


# ─── SECTION 1: Configuration ─────────────────────────────────────────────────

class GEFConfig:
    """
    GEF hyperparameters.
    All defaults match the paper's experimental setup.
    """
    # Scene initialization
    num_gaussians: int = 100_000   # initial primitive count (grows via densification)
    img_h: int = 540
    img_w: int = 960

    # Neighborhood entropy
    K: int = 50                     # K-nearest neighbors per primitive (Eq. 10)
    sigma_min: float = 0.005        # minimum primitive scale σ_min
    eps_snri: float = 0.5           # ε for SNRI threshold (in units of σ_min)
    alpha_snri: float = 200.0       # α decay steepness in SNRI (= 1/σ²_min)
    eps_entropy: float = 1e-6       # numerical stability for entropy

    # Multi-scale entropy alignment
    num_scales: int = 3             # L=3: fine(k=3), medium(k=5), coarse(k=9)
    beta_temperature: float = 6.0   # β in competitive softmax weighting (Eq. 13)
    lambda_scales: List = [1.0, 1.0, 1.0]  # λ_l per scale
    entropy_update_interval: int = 10  # N: update cached entropy every N iters

    # Loss weights
    lambda_sparsity: float = 0.1    # λ_sparsity
    lambda_align: float = 0.05      # λ_align
    lambda_depth: float = 100.0     # depth consistency weight
    lambda_normal: float = 0.05     # normal consistency weight
    lambda_ssim: float = 0.2        # SSIM weight in photometric loss

    # Training schedule (iter counts)
    geom_start_iter: int = 15_000   # geometric regularization start
    entropy_start_iter: int = 20_000 # entropy field start + k-NN graph fixed
    total_iters: int = 30_000

    # Mesh extraction
    tsdf_voxel_size: float = 0.02   # TSDF voxel size for mesh extraction

    def __init__(self, **kwargs):
        for k, v in kwargs.items(): setattr(self, k, v)

    @property
    def epsilon(self):
        return self.eps_snri * self.sigma_min

    @property
    def beta_coupling(self):
        return -self.sigma_min ** 2


# ─── SECTION 2: 3D Gaussian Primitive Representation ──────────────────────────

class GaussianPrimitives(nn.Module):
    """
    Learnable 3D Gaussian Splatting scene representation (Eq. 1-3).

    Each Gaussian primitive is parameterized by:
      μ  ∈ R³    : center position (world coordinates)
      s  ∈ R³    : log-scale (exponentiated to get scale vector)
      q  ∈ R⁴    : quaternion rotation (normalized → rotation matrix R)
      c  ∈ R³    : RGB color (spherical harmonic degree 0: view-independent)
      α  ∈ R      : pre-sigmoid opacity logit

    Covariance: Σ = R·diag(s)·diag(s)ᵀ·Rᵀ (Eq. 1)
    2D projection: Σ' = JWΣWᵀJᵀ (Eq. 2)
    Alpha-blending: C(p') = Σ c_i ω_i Π(1-ω_j) (Eq. 3)

    All parameters are optimized end-to-end via photometric loss.
    """

    def __init__(self, n: int, init_scale: float = 0.01):
        super().__init__()
        self.register_parameter('means',   nn.Parameter(torch.randn(n, 3) * 0.5))
        self.register_parameter('log_scales', nn.Parameter(torch.full((n, 3), math.log(init_scale))))
        self.register_parameter('quats',    nn.Parameter(self._init_quats(n)))
        self.register_parameter('colors',   nn.Parameter(torch.rand(n, 3)))
        self.register_parameter('opacity_logits', nn.Parameter(torch.zeros(n)))

    @staticmethod
    def _init_quats(n: int) -> Tensor:
        """Initialize quaternions to identity rotation."""
        q = torch.zeros(n, 4)
        q[:, 0] = 1.0   # w=1, x=y=z=0 → identity rotation
        return q

    @property
    def scales(self) -> Tensor:
        """Positive scale vector (exponentiated log-scales)."""
        return torch.exp(self.log_scales)       # (N, 3), always positive

    @property
    def opacities(self) -> Tensor:
        """Opacity in [0, 1] via sigmoid."""
        return torch.sigmoid(self.opacity_logits)  # (N,)

    @property
    def normals(self) -> Tensor:
        """
        Surface normal = principal axis (eigenvector corresponding to
        smallest eigenvalue of covariance matrix Σ).
        For a diagonal covariance, this is the axis with smallest scale.
        Uses quaternion → rotation matrix, then picks the min-scale column.
        """
        R = self._quat_to_rot(self.quats)                  # (N, 3, 3)
        min_idx = self.scales.argmin(dim=-1)               # (N,) which axis is thinnest
        batch_idx = torch.arange(self.means.shape[0], device=self.means.device)
        normals = R[batch_idx, :, min_idx]                   # (N, 3)
        return F.normalize(normals, dim=-1)

    def covariance_3d(self) -> Tensor:
        """Compute 3D covariance matrices Σ = R·S·Sᵀ·Rᵀ (Eq. 1)."""
        R = self._quat_to_rot(self.quats)                   # (N, 3, 3)
        S = torch.diag_embed(self.scales)                   # (N, 3, 3) diagonal
        RS = torch.bmm(R, S)                                  # (N, 3, 3)
        return torch.bmm(RS, RS.transpose(-2, -1))           # (N, 3, 3) = RSSᵀRᵀ

    @staticmethod
    def _quat_to_rot(q: Tensor) -> Tensor:
        """
        Convert unit quaternions to 3×3 rotation matrices.
        q: (N, 4) as (w, x, y, z), normalized.
        """
        q = F.normalize(q, dim=-1)
        w, x, y, z = q.unbind(dim=-1)
        R = torch.stack([
            1 - 2*(y*y + z*z),  2*(x*y - w*z),      2*(x*z + w*y),
            2*(x*y + w*z),      1 - 2*(x*x + z*z),  2*(y*z - w*x),
            2*(x*z - w*y),      2*(y*z + w*x),      1 - 2*(x*x + y*y),
        ], dim=-1).reshape(-1, 3, 3)
        return R


# ─── SECTION 3: Differentiable Alpha-Blending Renderer ────────────────────────

class SimpleGaussianRenderer(nn.Module):
    """
    Simplified differentiable renderer for GEF training (Eq. 2-3).

    This implementation uses ray-casting for correctness and clarity.
    For production: use the official 3DGS CUDA rasterizer from
    https://github.com/graphdeco-inria/gaussian-splatting

    Per-pixel color: C(p') = Σ_i c_i ω_i Π_{j

    def __init__(self, cfg: GEFConfig):
        super().__init__()
        self.cfg = cfg

    def project_gaussians(
        self,
        gaussians: GaussianPrimitives,
        camera_intrinsics: Dict[str, float],
        camera_pose: Tensor,         # (4, 4) world-to-camera transform W
    ) -> Tuple[Tensor, Tensor, Tensor]:
        """
        Project 3D Gaussians to 2D image plane (Eq. 2).
        Returns: 2D means (N, 2), 2D covariances (N, 2, 2), depths (N,)
        """
        N = gaussians.means.shape[0]
        fx, fy = camera_intrinsics['fx'], camera_intrinsics['fy']
        cx, cy = camera_intrinsics['cx'], camera_intrinsics['cy']

        # Transform means to camera space
        ones = torch.ones(N, 1, device=gaussians.means.device)
        means_h = torch.cat([gaussians.means, ones], dim=-1)  # (N, 4)
        means_cam = (camera_pose @ means_h.T).T[:, :3]          # (N, 3)
        depths = means_cam[:, 2].clamp(min=0.01)               # (N,)

        # Project to image plane
        u = fx * means_cam[:, 0] / depths + cx                  # (N,)
        v = fy * means_cam[:, 1] / depths + cy                  # (N,)
        means_2d = torch.stack([u, v], dim=-1)                   # (N, 2)

        # Project 3D covariance to 2D via J·W·Σ·Wᵀ·Jᵀ (Eq. 2)
        W_rot = camera_pose[:3, :3]                              # (3, 3)
        Sigma_3d = gaussians.covariance_3d()                       # (N, 3, 3)
        Sigma_cam = W_rot @ Sigma_3d @ W_rot.T                     # (N, 3, 3) broadcast
        # Jacobian of perspective projection
        J = torch.zeros(N, 2, 3, device=gaussians.means.device)
        J[:, 0, 0] = fx / depths
        J[:, 0, 2] = -fx * means_cam[:, 0] / (depths**2)
        J[:, 1, 1] = fy / depths
        J[:, 1, 2] = -fy * means_cam[:, 1] / (depths**2)
        Sigma_2d = torch.bmm(torch.bmm(J, Sigma_cam), J.transpose(-2, -1))  # (N, 2, 2)
        # Add small regularization for numerical stability
        eye2 = torch.eye(2, device=Sigma_2d.device).unsqueeze(0) * 0.3
        Sigma_2d = Sigma_2d + eye2

        return means_2d, Sigma_2d, depths

    def render(
        self,
        gaussians: GaussianPrimitives,
        camera_intrinsics: Dict,
        camera_pose: Tensor,
        H: int, W: int,
    ) -> Dict[str, Tensor]:
        """
        Simplified tile-based rendering.
        Returns: rgb (H,W,3), depth (H,W), normal (H,W,3), alpha_map (H,W), opacity_entropy (H,W)

        For scale: production code should use CUDA rasterizer.
        This CPU/GPU portable implementation is for training integration clarity.
        """
        means_2d, sigmas_2d, depths = self.project_gaussians(gaussians, camera_intrinsics, camera_pose)

        # Visibility filter: keep only Gaussians with depth > 0 and in frustum
        visible = (depths > 0.01) & (means_2d[:, 0] > 0) & (means_2d[:, 0] < W) \
                  & (means_2d[:, 1] > 0) & (means_2d[:, 1] < H)
        vis_idx = visible.nonzero(as_tuple=True)[0]
        if vis_idx.numel() == 0:
            return {'rgb': torch.zeros(H, W, 3), 'depth': torch.zeros(H, W),
                    'normal': torch.zeros(H, W, 3), 'alpha_map': torch.zeros(H, W),
                    'opacity_entropy': torch.zeros(H, W)}

        # Sort visible Gaussians by depth (front to back for alpha compositing)
        vis_depths = depths[vis_idx]
        sort_idx = vis_depths.argsort()
        idx_sorted = vis_idx[sort_idx]

        # Rasterize: compute per-pixel Gaussian contributions
        rgb_out     = torch.zeros(H * W, 3, device=gaussians.means.device)
        depth_out   = torch.zeros(H * W, device=gaussians.means.device)
        normal_out  = torch.zeros(H * W, 3, device=gaussians.means.device)
        transmit    = torch.ones(H * W, device=gaussians.means.device)
        alpha_acc   = torch.zeros(H * W, device=gaussians.means.device)
        H_alpha_acc = torch.zeros(H * W, device=gaussians.means.device)

        # Pixel grid
        yy, xx = torch.meshgrid(torch.arange(H, device=gaussians.means.device, dtype=torch.float32),
                                 torch.arange(W, device=gaussians.means.device, dtype=torch.float32),
                                 indexing='ij')
        pixels = torch.stack([xx.reshape(-1), yy.reshape(-1)], dim=-1)  # (HW, 2)

        normals = gaussians.normals  # (N, 3)

        # Alpha-blending (Eq. 3) — process each Gaussian in depth order
        # For production use CUDA tile rasterizer; this loop illustrates the math
        for g_idx in idx_sorted[:min(500, len(idx_sorted))]:  # cap for smoke test
            mu2d = means_2d[g_idx]           # (2,)
            sig2d = sigmas_2d[g_idx]         # (2, 2)
            alpha_g = gaussians.opacities[g_idx]   # scalar
            color_g = gaussians.colors[g_idx]      # (3,)
            depth_g = depths[g_idx]                # scalar
            normal_g = normals[g_idx]              # (3,)

            # Evaluate 2D Gaussian at all pixel positions
            diff = pixels - mu2d.unsqueeze(0)      # (HW, 2)
            try:
                sig_inv = torch.linalg.inv(sig2d)
            except:
                sig_inv = torch.eye(2, device=sig2d.device)
            mahal = (diff @ sig_inv * diff).sum(dim=-1)  # (HW,) Mahalanobis²
            gauss_val = torch.exp(-0.5 * mahal.clamp(max=20))  # (HW,)

            omega = alpha_g * gauss_val                            # (HW,) contribution
            omega_weighted = transmit * omega                      # (HW,) visible contribution

            rgb_out    += omega_weighted.unsqueeze(-1) * color_g.unsqueeze(0)
            depth_out  += omega_weighted * depth_g
            normal_out += omega_weighted.unsqueeze(-1) * normal_g.unsqueeze(0)
            alpha_acc  += omega_weighted

            # Per-pixel opacity entropy accumulation (for entropy alignment)
            p_safe = omega_weighted.clamp(min=1e-8)
            H_alpha_acc -= omega_weighted * torch.log(p_safe)     # -ω log ω

            transmit = transmit * (1.0 - omega)                  # update transmittance

        return {
            'rgb':             rgb_out.reshape(H, W, 3),
            'depth':           depth_out.reshape(H, W),
            'normal':          F.normalize(normal_out, dim=-1).reshape(H, W, 3),
            'alpha_map':       alpha_acc.reshape(H, W),
            'opacity_entropy': H_alpha_acc.reshape(H, W),
        }


# ─── SECTION 4: SNRI — Surface Neighborhood Redundancy Index ──────────────────

def compute_snri(
    means: Tensor,      # (N, 3) primitive centers
    normals: Tensor,    # (N, 3) primitive surface normals
    knn_idx: Tensor,    # (N, K) pre-computed k-nearest-neighbor indices
    alpha_decay: float,
    epsilon: float,
) -> Tensor:
    """
    Surface Neighborhood Redundancy Index (Eq. 6).

    SNRI(i) = (1/K) Σ_j exp(-α · [ε - |n_i · (μ_j - μ_i)|]₊)

    Geometric interpretation:
      d_j = |n_i · (μ_j - μ_i)|  measures how far neighbor j is from
                                   the surface tangent plane of primitive i.
      d_j small → j is tangential (same surface) → contribution is LOW
      d_j large → j is stacked along normal → contribution is HIGH (≈1)

    Output: SNRI ∈ [0,1] per primitive
      → Low SNRI: tangentially distributed (legitimate surface coverage)
      → High SNRI: normal-direction stacking (true geometric redundancy)

    This gating prevents over-sparsification of meaningful surface detail.
    """
    N, K = knn_idx.shape
    mu_neighbors = means[knn_idx.reshape(-1)].reshape(N, K, 3)  # (N, K, 3)
    delta = mu_neighbors - means.unsqueeze(1)                        # (N, K, 3)
    # Projected distance along surface normal of primitive i
    d_j = (normals.unsqueeze(1) * delta).sum(dim=-1).abs()           # (N, K) |n_i · Δμ|
    # Per-neighbor contribution: exp(-α · [ε - d_j]₊)
    hinge = F.relu(epsilon - d_j)                                     # (N, K), zero when d_j > ε
    contribution = torch.exp(-alpha_decay * hinge)                    # (N, K) ∈ [0, 1]
    snri = contribution.mean(dim=-1)                                 # (N,)
    return snri


def build_knn_graph(means: Tensor, K: int) -> Tensor:
    """
    Build K-nearest-neighbor graph over Gaussian centers.
    Fixed at iteration 20,000 after primitives stabilize near surfaces.
    One-time cost, amortized over remaining 10,000 iterations.

    Returns: knn_idx (N, K) — indices of K nearest neighbors per primitive.
    """
    N = means.shape[0]
    # Pairwise distance matrix (efficient via |a-b|² = |a|² - 2aᵀb + |b|²)
    dist2 = (means.unsqueeze(0) - means.unsqueeze(1)).pow(2).sum(dim=-1)  # (N, N)
    # Exclude self (set diagonal to infinity)
    dist2.fill_diagonal_(float('inf'))
    # K nearest by distance
    _, knn_idx = dist2.topk(K, dim=-1, largest=False)                     # (N, K)
    return knn_idx.detach()   # detach: topology is fixed, not optimized


# ─── SECTION 5: Image Entropy Computation ─────────────────────────────────────

def compute_image_entropy(image: Tensor, window_size: int) -> Tensor:
    """
    Local image entropy via sliding-window analysis (Eq. 7).

    E_img(x,y) = -Σ_{(u,v)∈Ω_{k×k}(x,y)} p(u,v) log₂ p(u,v)

    where p(u,v) is the normalized intensity histogram probability in
    the k×k window centered at (x,y).

    Implementation: unfold the image into patches, compute histogram
    entropy per patch, fold back to image shape.

    image: (H, W) grayscale or (H, W, C) → averaged to grayscale
    Returns: (H, W) entropy map
    """
    if image.dim() == 3:
        image = image.mean(dim=-1)  # (H, W) grayscale
    H, W = image.shape
    pad = window_size // 2
    image_padded = F.pad(image.unsqueeze(0).unsqueeze(0), [pad]*4, mode='reflect')[0, 0]
    # Unfold into overlapping patches
    patches = image_padded.unfold(0, window_size, 1).unfold(1, window_size, 1)
    patches = patches.reshape(H, W, -1)   # (H, W, k*k)
    # Discretize to 32 bins for histogram entropy
    num_bins = 32
    patches_q = (patches * (num_bins - 1)).long().clamp(0, num_bins - 1)
    entropy_map = torch.zeros(H, W, device=image.device)
    for b in range(num_bins):
        p = (patches_q == b).float().mean(dim=-1)   # (H, W) histogram prob at bin b
        safe_p = p.clamp(min=1e-10)
        entropy_map -= p * torch.log2(safe_p)         # -p log₂ p (only nonzero where p>0)
    return entropy_map


def compute_primitive_image_entropy(
    alpha_map: Tensor,       # (H, W) rendered opacity map
    image_entropy: Tensor,   # (H, W) image entropy map
    means_2d: Tensor,        # (N, 2) projected Gaussian centers
    H: int, W: int,
) -> Tensor:
    """
    Per-primitive image entropy (Eq. 8): weighted average over influenced pixels.

    E_img_k = Σ_{x,y} α_k(x,y) · E_img(x,y)  /  Σ_{x,y} α_k(x,y)

    For efficiency, approximates each primitive's pixel influence as a small
    window around its projected center (avoids full alpha map per primitive).
    """
    N = means_2d.shape[0]
    primitive_entropy = torch.zeros(N, device=alpha_map.device)
    r = 5  # local window radius around projected center

    for k in range(N):
        cx, cy = means_2d[k, 0].long().clamp(0, W-1), means_2d[k, 1].long().clamp(0, H-1)
        x0, x1 = (cx - r).clamp(0, W-1), (cx + r).clamp(0, W-1)
        y0, y1 = (cy - r).clamp(0, H-1), (cy + r).clamp(0, H-1)
        alpha_window   = alpha_map[y0:y1+1, x0:x1+1]
        entropy_window = image_entropy[y0:y1+1, x0:x1+1]
        denom = alpha_window.sum() + 1e-8
        primitive_entropy[k] = (alpha_window * entropy_window).sum() / denom

    return primitive_entropy   # (N,)


# ─── SECTION 6: Opacity Entropy in 3D Neighborhoods ──────────────────────────

def compute_neighborhood_opacity_entropy(
    opacities: Tensor,   # (N,) primitive opacities
    knn_idx: Tensor,     # (N, K) neighbor indices
    eps: float = 1e-8,
) -> Tensor:
    """
    Neighborhood-based entropy (Eq. 10 inner term).

    For each primitive k, treats the opacities of its K neighbors as
    a probability distribution and computes Shannon entropy.

    This is the 1280× faster approximation to ray-based opacity entropy
    (Eq. 5): the same redundancy manifests whether you look along a ray
    or within a spatial neighborhood of the redundant primitives.

    Returns: H_alpha (N,) neighborhood entropy per primitive.
    """
    N, K = knn_idx.shape
    # Gather neighbor opacities
    neighbor_opacities = opacities[knn_idx.reshape(-1)].reshape(N, K)  # (N, K)
    # Normalize to pseudo-probability distribution per neighborhood
    total = neighbor_opacities.sum(dim=-1, keepdim=True) + eps
    alpha_tilde = neighbor_opacities / total                              # (N, K)
    # Shannon entropy: H(α) = -Σ ã log ã
    safe_p = alpha_tilde.clamp(min=eps)
    H_alpha = -(alpha_tilde * safe_p.log()).sum(dim=-1)                 # (N,)
    return H_alpha


def compute_adaptive_threshold(
    snri: Tensor,       # (N,)
    sigma_min: float,
    epsilon_stab: float,
    beta_coupling: float,
) -> Tensor:
    """
    Adaptive entropy threshold η_k (Eq. 11).

    η_k = ½ ln(2πe σ²_min) + ε + β·SNRI_k

    The base threshold ½ ln(2πe σ²_min) is the differential entropy of a
    Gaussian with variance σ²_min — the minimum entropy for a well-defined
    primitive. β = -σ²_min lowers the threshold for high-SNRI primitives
    (normal-stacked → penalized earlier) and raises it for low-SNRI ones
    (tangential → given more flexibility).
    """
    base = 0.5 * math.log(2 * math.pi * math.e * sigma_min**2)
    eta = base + epsilon_stab + beta_coupling * snri   # (N,)
    return eta


class EntropySparsityLoss(nn.Module):
    """
    Entropy-regularized sparsity constraint (Eq. 10-11, Section 3.2).

    L_sparsity = Σ_k  w_img_k · max(0, H(α_k) - η_k)²

    Only penalizes primitives whose neighborhood entropy EXCEEDS the
    adaptive threshold η_k. The hinge ensures no gradient flows for
    well-behaved, already-sparse neighborhoods.

    The image entropy weights w_img_k provide spatial adaptivity:
      - Low weight at complex image regions → relax sparsification pressure
      - High weight at simple image regions → aggressive sparsification
    """
    def __init__(self, cfg: GEFConfig):
        super().__init__()
        self.cfg = cfg

    def forward(
        self,
        gaussians: GaussianPrimitives,
        knn_idx: Tensor,
        primitive_img_entropy: Tensor,  # (N,) per-primitive image entropy
    ) -> Tensor:
        """Returns scalar sparsity loss."""
        cfg = self.cfg
        N = gaussians.means.shape[0]

        # SNRI: identify truly redundant (normal-stacked) primitives
        snri = compute_snri(
            gaussians.means, gaussians.normals, knn_idx,
            cfg.alpha_snri, cfg.epsilon
        )   # (N,)

        # Neighborhood opacity entropy
        H_alpha = compute_neighborhood_opacity_entropy(
            gaussians.opacities, knn_idx, eps=cfg.eps_entropy
        )   # (N,)

        # Adaptive entropy threshold η_k
        eta = compute_adaptive_threshold(
            snri, cfg.sigma_min, cfg.eps_entropy, cfg.beta_coupling
        )   # (N,)

        # Image entropy-guided weight (Eq. 9): inverse normalization
        E_min = primitive_img_entropy.min()
        E_max = primitive_img_entropy.max() + cfg.eps_entropy
        w_img = 1.0 - (primitive_img_entropy - E_min) / (E_max - E_min)  # (N,) ∈ [0,1]

        # Exclude tangential primitives (low SNRI) from sparsification gate
        # (they provide legitimate surface coverage, not redundancy)
        tangential_mask = snri < 0.3   # primitives spreading along surface → preserve
        w_img = w_img * (~tangential_mask).float()

        # Hinge loss: penalize entropy exceeding threshold (Eq. 10)
        excess = F.relu(H_alpha - eta)           # (N,) only positive violations
        loss = (w_img * excess.pow(2)).sum()    # weighted squared hinge
        return loss


# ─── SECTION 7: Multi-Scale Entropy Alignment ─────────────────────────────────

class MultiScaleEntropyAlignment(nn.Module):
    """
    Multi-scale entropy alignment with competitive cross-scale weighting (Eq. 13-14, Section 3.3).

    Prevents excessive smoothing at geometric discontinuities by enforcing
    that rendered opacity entropy matches image entropy at corresponding
    locations across multiple spatial scales.

    Three scales:
      l=1: k=3×3  fine (textures, sharp edges)       → wins in high-detail regions
      l=2: k=5×5  medium (intermediate structure)    → wins in moderate regions
      l=3: k=9×9  coarse (smooth, homogeneous areas) → wins in flat regions

    Competitive softmax ensures spatial exclusivity:
      >85% of pixels dominated by one scale (concentration metric > 0.6).
    """
    def __init__(self, cfg: GEFConfig):
        super().__init__()
        self.cfg = cfg
        self.window_sizes = [2*l + 1 for l in range(1, cfg.num_scales + 1)]  # [3, 5, 7]

    def compute_multi_scale_image_entropy(self, image: Tensor) -> List[Tensor]:
        """Compute image entropy maps at each scale."""
        entropy_maps = []
        for k in self.window_sizes:
            E = compute_image_entropy(image, window_size=k)   # (H, W)
            entropy_maps.append(E)
        return entropy_maps   # L × (H, W)

    def competitive_weights(self, entropy_maps: List[Tensor]) -> List[Tensor]:
        """
        Compute competitive softmax weights across scales (Eq. 13).

        W_l(x,y) = exp(β·E^l_norm) / Σ_j exp(β·E^j_norm)

        E^l_norm: percentile-normalized entropy at scale l (per-scale,
        to avoid global normalization artifacts from different magnitudes
        at different window sizes).
        """
        beta = self.cfg.beta_temperature
        L = len(entropy_maps)
        H, W = entropy_maps[0].shape

        # Percentile-normalize each scale independently
        normed = []
        for E in entropy_maps:
            p5  = torch.quantile(E, 0.05)
            p95 = torch.quantile(E, 0.95)
            E_n = (E - p5) / (p95 - p5 + 1e-8)
            normed.append(E_n.clamp(0, 1))

        # Stack and apply competitive softmax (Eq. 13)
        stacked = torch.stack(normed, dim=0)      # (L, H, W)
        weights = F.softmax(beta * stacked, dim=0) # (L, H, W), sums to 1 across scales
        return [weights[l] for l in range(L)]      # L × (H, W)

    def forward(
        self,
        rendered_entropy: Tensor,   # (H, W) rendered opacity entropy map
        image: Tensor,              # (H, W) or (H, W, 3) ground-truth image
    ) -> Tensor:
        """
        Multi-scale alignment loss (Eq. 14).

        L_align = Σ_l λ_l · Σ_i  W^l_i · |H^l_i - E^l_i|²

        H^l_i: rendered opacity entropy at scale l, pixel i
               (approximated by blurring the pixel-wise rendered entropy)
        E^l_i: image entropy at scale l, pixel i
        W^l_i: competitive scale weight at scale l, pixel i
        """
        H, W = rendered_entropy.shape
        cfg = self.cfg

        # Multi-scale image entropy maps
        target_entropies = self.compute_multi_scale_image_entropy(image)  # L × (H, W)

        # Competitive weights
        weights = self.competitive_weights(target_entropies)               # L × (H, W)

        total_loss = torch.tensor(0.0, device=rendered_entropy.device)
        for l, (target_E, weight_W, lam) in enumerate(
            zip(target_entropies, weights, cfg.lambda_scales)):
            # Multi-scale rendered entropy: blur the rendered entropy map at scale l
            k = self.window_sizes[l]
            pad = k // 2
            rendered_blurred = F.avg_pool2d(
                rendered_entropy.unsqueeze(0).unsqueeze(0),
                kernel_size=k, stride=1, padding=pad
            )[0, 0]   # (H, W)
            # Weighted MSE between rendered and target entropy at this scale
            diff = (rendered_blurred - target_E).pow(2)   # (H, W)
            total_loss = total_loss + lam * (weight_W * diff).sum()

        return total_loss


# ─── SECTION 8: Delayed Entropy Cache ─────────────────────────────────────────

class EntropyCache:
    """
    Delayed entropy update strategy (Eq. 15-16, Section 3.3).

    Configurational entropy distributions evolve much slower than
    individual Gaussian parameters during optimization. This temporal
    scale separation justifies a computationally efficient strategy:
      θ̇ = -∇_θ L(θ, H_cached)     [optimize under fixed entropy maps]
      H_cached(t+Δt) = H_render(θ(t+Δt))  [update every N iterations]

    Achieves 3-4× speedup with <1% quality degradation when N=10-20.
    """
    def __init__(self, update_interval: int = 10):
        self.N = update_interval
        self._cached_entropy: Optional[Tensor] = None
        self._last_update: int = -1

    def update_if_needed(self, iteration: int, new_entropy: Tensor) -> Tensor:
        """Return cached entropy, refreshing every N iterations."""
        if (self._cached_entropy is None or
                (iteration - self._last_update) >= self.N):
            self._cached_entropy = new_entropy.detach()   # detach from gradient graph
            self._last_update = iteration
        return self._cached_entropy

    @property
    def has_cache(self) -> bool:
        return self._cached_entropy is not None


# ─── SECTION 9: Depth and Normal Consistency Losses ───────────────────────────

def depth_consistency_loss(
    depth_map: Tensor,   # (H, W) rendered depth
    alpha_map: Tensor,   # (H, W) rendered opacity (blending weights)
) -> Tensor:
    """
    Depth consistency loss from 2DGS (Eq. 12).

    L_depth = Σ_{i,j∈r} ω_i ω_j ‖d_i - d_j‖²₂

    Penalizes variance in depth estimates within local ray neighborhoods,
    promoting convergence toward sharp, consistent depth estimates.

    Approximated here over local pixel neighborhoods (3×3 windows)
    weighted by rendered opacity to match the ray-based formulation.
    """
    depth = depth_map.unsqueeze(0).unsqueeze(0)    # (1, 1, H, W)
    alpha = alpha_map.unsqueeze(0).unsqueeze(0)    # (1, 1, H, W)

    # Local depth mean via average pooling (neighborhood estimate of depth)
    depth_mean = F.avg_pool2d(depth, kernel_size=3, stride=1, padding=1)
    depth_var  = F.avg_pool2d(depth**2, kernel_size=3, stride=1, padding=1) - depth_mean**2
    depth_var  = depth_var.clamp(min=0)

    # Weight by opacity: inconsistency in low-alpha regions matters less
    alpha_weight = alpha.detach()
    loss = (alpha_weight * depth_var).mean()
    return loss


def normal_consistency_loss(normal_map: Tensor, depth_map: Tensor) -> Tensor:
    """
    Normal consistency loss (Eq. 17).

    L_normal = Σ_{i∈r} ω_i (1 - n_i^⊤ n')

    n'  = surface normal from finite differences on depth map
    n_i = Gaussian primitive's normal (principal axis of covariance)

    The term (1 - n_i^⊤ n') measures angular difference between the
    predicted Gaussian normal and the depth-derived surface orientation.
    Encourages Gaussian primitives to align tangentially to the surface.
    """
    # Finite difference normals from depth map
    # n' = normalize([-dZ/dx, -dZ/dy, 1])
    dz_dx = depth_map[:, 1:] - depth_map[:, :-1]    # (H, W-1)
    dz_dy = depth_map[1:, :] - depth_map[:-1, :]    # (H-1, W)
    # Pad to restore original size
    dz_dx = F.pad(dz_dx, [0, 1])    # (H, W)
    dz_dy = F.pad(dz_dy, [0, 0, 0, 1])  # (H, W)
    normal_fd = torch.stack([-dz_dx, -dz_dy, torch.ones_like(dz_dx)], dim=-1)
    normal_fd = F.normalize(normal_fd, dim=-1)   # (H, W, 3)

    # Angular difference: 1 - n^⊤ n' (zero when perfectly aligned)
    cos_sim = (normal_map * normal_fd).sum(dim=-1)  # (H, W)
    loss = (1.0 - cos_sim).mean()
    return loss


# ─── SECTION 10: Complete GEF Loss Function ───────────────────────────────────

class GEFLoss(nn.Module):
    """
    Complete GEF training loss (Eq. 18-19, Section 3.4).

    L = L_photometric + L_entropy + L_geom

    L_photometric = (1-λ_ssim)·L1 + λ_ssim·(1-SSIM)   [standard 3DGS loss]
    L_entropy = λ_sparsity·L_sparsity + λ_align·L_align [entropy field losses]
    L_geom    = L_depth + λ_normal·L_normal              [geometric consistency]

    Training schedule (critical for stable convergence):
      iter 0-15k:   photometric only  (warm up geometry)
      iter 15k+:    + geometric losses (align normals with surface)
      iter 20k+:    + entropy losses   (sparsify redundant primitives)

    The delayed entropy activation ensures primitives are already near surfaces
    before entropy-driven sparsification begins, preventing premature pruning.
    """
    def __init__(self, cfg: GEFConfig):
        super().__init__()
        self.cfg = cfg
        self.sparsity_loss = EntropySparsityLoss(cfg)
        self.align_loss    = MultiScaleEntropyAlignment(cfg)

    @staticmethod
    def ssim_loss(rendered: Tensor, gt: Tensor, window_size: int = 11) -> Tensor:
        """Simplified SSIM-based perceptual loss."""
        C1, C2 = 0.01**2, 0.03**2
        mu_r = F.avg_pool2d(rendered, window_size, stride=1, padding=window_size//2)
        mu_g = F.avg_pool2d(gt,       window_size, stride=1, padding=window_size//2)
        var_r = F.avg_pool2d(rendered**2, window_size, stride=1, padding=window_size//2) - mu_r**2
        var_g = F.avg_pool2d(gt**2,       window_size, stride=1, padding=window_size//2) - mu_g**2
        cov   = F.avg_pool2d(rendered * gt, window_size, stride=1, padding=window_size//2) - mu_r * mu_g
        ssim  = ((2 * mu_r * mu_g + C1) * (2 * cov + C2)) / \
                ((mu_r**2 + mu_g**2 + C1) * (var_r + var_g + C2))
        return 1.0 - ssim.mean()

    def photometric_loss(self, rendered_rgb: Tensor, gt_rgb: Tensor) -> Tensor:
        """Standard 3DGS photometric loss: L1 + SSIM."""
        l1   = F.l1_loss(rendered_rgb, gt_rgb)
        ssim = self.ssim_loss(
            rendered_rgb.permute(2, 0, 1).unsqueeze(0),
            gt_rgb.permute(2, 0, 1).unsqueeze(0),
        )
        return (1.0 - self.cfg.lambda_ssim) * l1 + self.cfg.lambda_ssim * ssim

    def forward(
        self,
        rendered: Dict[str, Tensor],         # renderer output dict
        gt_rgb: Tensor,                       # (H, W, 3) ground-truth image
        gaussians: GaussianPrimitives,
        knn_idx: Optional[Tensor],
        primitive_img_entropy: Optional[Tensor],
        entropy_cache: Optional[EntropyCache],
        iteration: int,
    ) -> Dict[str, Tensor]:
        """
        Compute all GEF losses with training schedule gating.

        Returns dict with 'total', 'photometric', 'sparsity', 'align',
        'depth', 'normal' keys (individual losses for logging).
        """
        cfg = self.cfg
        losses = {}

        # ── L_photometric (always active) ────────────────────────────────────
        l_photo = self.photometric_loss(rendered['rgb'], gt_rgb)
        losses['photometric'] = l_photo
        total = l_photo

        # ── L_geom (active from iter 15,000) ─────────────────────────────────
        l_depth, l_normal = torch.tensor(0.0), torch.tensor(0.0)
        if iteration >= cfg.geom_start_iter:
            l_depth  = depth_consistency_loss(rendered['depth'], rendered['alpha_map'])
            l_normal = normal_consistency_loss(rendered['normal'], rendered['depth'])
            total = total + cfg.lambda_depth * l_depth + cfg.lambda_normal * l_normal
        losses['depth']  = l_depth
        losses['normal'] = l_normal

        # ── L_entropy (active from iter 20,000, requires k-NN graph) ─────────
        l_sparse, l_align = torch.tensor(0.0), torch.tensor(0.0)
        if iteration >= cfg.entropy_start_iter and knn_idx is not None:
            # Entropy sparsity: penalize high-entropy neighborhoods
            if primitive_img_entropy is not None:
                l_sparse = self.sparsity_loss(gaussians, knn_idx, primitive_img_entropy)
            # Entropy alignment: match rendered entropy distribution to image entropy
            cached_entropy = rendered['opacity_entropy']
            if entropy_cache is not None:
                cached_entropy = entropy_cache.update_if_needed(iteration, cached_entropy)
            l_align = self.align_loss(cached_entropy, gt_rgb)
            total = total + cfg.lambda_sparsity * l_sparse + cfg.lambda_align * l_align
        losses['sparsity'] = l_sparse
        losses['align']    = l_align
        losses['total']    = total
        return losses


# ─── SECTION 11: GEF Trainer ──────────────────────────────────────────────────

class GEFTrainer:
    """
    GEF training loop with full optimization schedule.

    Mirrors the paper's implementation on NVIDIA RTX 4090:
      - Geometric regularization starts at iter 15,000
      - Entropy field regularization starts at iter 20,000
      - k-NN graph built once at iter 20,000 (primitives near surfaces)
      - Entropy maps cached and updated every N=10-20 iterations
      - Total 30,000 iterations

    For full 3DGS densification (splitting/pruning based on opacity and
    gradient magnitude): see the official 3DGS repo implementation which
    manages adaptive control of Gaussian count.
    """
    def __init__(self, gaussians: GaussianPrimitives, cfg: GEFConfig):
        self.gaussians = gaussians
        self.cfg = cfg
        self.loss_fn = GEFLoss(cfg)
        self.renderer = SimpleGaussianRenderer(cfg)
        self.entropy_cache = EntropyCache(cfg.entropy_update_interval)
        self.knn_idx: Optional[Tensor] = None
        self.optimizer = torch.optim.Adam([
            {'params': gaussians.means,          'lr': 1.6e-4},
            {'params': gaussians.log_scales,      'lr': 5e-3},
            {'params': gaussians.quats,           'lr': 1e-3},
            {'params': gaussians.colors,          'lr': 2.5e-3},
            {'params': gaussians.opacity_logits,  'lr': 5e-2},
        ])

    def _build_knn_if_needed(self, iteration: int) -> None:
        """Build k-NN graph at iter 20,000 when primitives stabilize near surfaces."""
        if iteration == self.cfg.entropy_start_iter and self.knn_idx is None:
            print(f"  [iter {iteration}] Building k-NN graph (K={self.cfg.K})...")
            self.knn_idx = build_knn_graph(self.gaussians.means.detach(), self.cfg.K)
            print(f"  k-NN graph built. Shape: {self.knn_idx.shape}")

    def train_step(
        self,
        iteration: int,
        camera_intrinsics: Dict,
        camera_pose: Tensor,
        gt_rgb: Tensor,        # (H, W, 3) ground-truth image
        H: int, W: int,
    ) -> Dict[str, float]:
        """Single optimization iteration."""
        self._build_knn_if_needed(iteration)
        self.optimizer.zero_grad()

        # Render scene
        rendered = self.renderer.render(self.gaussians, camera_intrinsics, camera_pose, H, W)

        # Compute per-primitive image entropy (only when entropy losses are active)
        primitive_img_entropy = None
        if iteration >= self.cfg.entropy_start_iter and self.knn_idx is not None:
            img_entropy = compute_image_entropy(gt_rgb, window_size=7)
            means_2d, _, _ = self.renderer.project_gaussians(
                self.gaussians, camera_intrinsics, camera_pose)
            primitive_img_entropy = compute_primitive_image_entropy(
                rendered['alpha_map'].detach(), img_entropy,
                means_2d.detach(), H, W
            )

        # Compute losses
        losses = self.loss_fn(
            rendered, gt_rgb, self.gaussians, self.knn_idx,
            primitive_img_entropy, self.entropy_cache, iteration
        )

        # Backpropagate and optimize
        losses['total'].backward()
        torch.nn.utils.clip_grad_norm_(self.gaussians.parameters(), 1.0)
        self.optimizer.step()

        return {k: v.item() if isinstance(v, Tensor) else v
                for k, v in losses.items()}


# ─── SECTION 12: Mesh Extraction & Smoke Test ─────────────────────────────────

class TSDFMeshExtractor:
    """
    Mesh extraction via TSDF fusion + Marching Cubes (Section 4, Impl. Details).

    Standard GEF mesh pipeline (mirrors paper implementation):
      1. Render depth maps for all training views
      2. Fuse depth maps into TSDF volume (Curless & Levoy, 1996)
      3. Extract mesh via Marching Cubes (Lorensen & Cline, 1987)

    For production: use open3d.pipelines.integration.ScalableTSDFVolume
    which handles large scenes more efficiently.

    Returns a triangle mesh as numpy arrays (vertices, faces).
    """
    def __init__(self, voxel_size: float = 0.02, trunc_dist_factor: float = 3.0):
        self.voxel_size = voxel_size
        self.trunc_dist = voxel_size * trunc_dist_factor

    def fuse_depth_maps(
        self,
        depth_maps: List[Tensor],        # list of (H, W) depth maps
        poses: List[Tensor],             # list of (4, 4) camera-to-world poses
        intrinsics: Dict[str, float],
        volume_bounds: Tuple,            # (min_xyz, max_xyz) tuples
    ) -> Tuple[np.ndarray, np.ndarray]:
        """
        Simplified TSDF fusion and Marching Cubes.

        Full implementation requires open3d or custom CUDA TSDF.
        This illustrates the algorithmic pipeline:
          1. Allocate TSDF voxel volume
          2. For each depth map, update TSDF values along view frustum
          3. Extract isosurface at TSDF=0 via Marching Cubes
        """
        try:
            import open3d as o3d
            volume = o3d.pipelines.integration.ScalableTSDFVolume(
                voxel_length=self.voxel_size,
                sdf_trunc=self.trunc_dist,
                color_type=o3d.pipelines.integration.TSDFVolumeColorType.NoColor
            )
            fx = intrinsics['fx']; fy = intrinsics['fy']
            cx = intrinsics['cx']; cy = intrinsics['cy']
            H, W = depth_maps[0].shape
            cam_intrinsic = o3d.camera.PinholeCameraIntrinsic(W, H, fx, fy, cx, cy)
            for depth, pose in zip(depth_maps, poses):
                depth_np = depth.cpu().numpy().astype(np.float32)
                depth_o3d = o3d.geometry.Image(depth_np)
                dummy_rgb = o3d.geometry.Image(np.ones((H, W, 3), dtype=np.uint8) * 128)
                rgbd = o3d.geometry.RGBDImage.create_from_color_and_depth(
                    dummy_rgb, depth_o3d, depth_scale=1.0, depth_trunc=5.0)
                extrinsic = np.linalg.inv(pose.cpu().numpy())
                volume.integrate(rgbd, cam_intrinsic, extrinsic)
            mesh = volume.extract_triangle_mesh()
            mesh.compute_vertex_normals()
            vertices = np.asarray(mesh.vertices)
            faces    = np.asarray(mesh.triangles)
            return vertices, faces
        except ImportError:
            print("[TSDFExtractor] open3d not available. Install: pip install open3d")
            # Return placeholder mesh
            vertices = np.zeros((4, 3), dtype=np.float32)
            faces    = np.array([[0,1,2],[0,2,3]], dtype=np.int32)
            return vertices, faces


def run_smoke_test(device_str: str = "cpu", n_iters: int = 5):
    device = torch.device(device_str)
    print("=" * 65)
    print("  GEF — Gaussian Entropy Fields — Full Architecture Smoke Test")
    print("=" * 65)
    torch.manual_seed(42)

    # Tiny config for fast CPU test
    cfg = GEFConfig(num_gaussians=200, img_h=32, img_w=32, K=10,
                    geom_start_iter=2, entropy_start_iter=3, total_iters=5)

    # [1] Initialize Gaussian primitives
    print("\n[1/5] Initializing Gaussian primitives...")
    gaussians = GaussianPrimitives(cfg.num_gaussians, init_scale=0.02).to(device)
    n_params = sum(p.numel() for p in gaussians.parameters())
    print(f"  Primitives: {cfg.num_gaussians}  |  Parameters: {n_params:,}")

    # [2] Test SNRI computation
    print("\n[2/5] Testing SNRI computation...")
    knn_idx = build_knn_graph(gaussians.means.detach(), cfg.K)
    snri = compute_snri(gaussians.means, gaussians.normals, knn_idx,
                        cfg.alpha_snri, cfg.epsilon)
    print(f"  SNRI shape: {tuple(snri.shape)}  min={snri.min():.3f}  max={snri.max():.3f}")
    assert snri.shape == (cfg.num_gaussians,)
    assert 0.0 <= snri.min().item() and snri.max().item() <= 1.0

    # [3] Test image entropy
    print("\n[3/5] Testing image entropy computation...")
    dummy_image = torch.rand(cfg.img_h, cfg.img_w, 3, device=device)
    entropy_map = compute_image_entropy(dummy_image, window_size=3)
    print(f"  Image entropy shape: {tuple(entropy_map.shape)}  range=[{entropy_map.min():.3f}, {entropy_map.max():.3f}]")

    # [4] Test multi-scale alignment
    print("\n[4/5] Testing multi-scale entropy alignment...")
    align_module = MultiScaleEntropyAlignment(cfg)
    rendered_entropy = torch.rand(cfg.img_h, cfg.img_w, device=device)
    l_align = align_module(rendered_entropy, dummy_image)
    print(f"  Multi-scale alignment loss: {l_align.item():.4f}")

    # [5] Full training loop
    print(f"\n[5/5] Full training loop ({n_iters} steps with tiny config)...")
    trainer = GEFTrainer(gaussians, cfg)
    camera_intrinsics = {'fx': 20.0, 'fy': 20.0, 'cx': 16.0, 'cy': 16.0}
    camera_pose = torch.eye(4, device=device)
    camera_pose[2, 3] = 2.0    # camera 2m in front of origin
    gt_rgb = torch.rand(cfg.img_h, cfg.img_w, 3, device=device)

    for step in range(n_iters):
        losses = trainer.train_step(step, camera_intrinsics, camera_pose, gt_rgb,
                                    cfg.img_h, cfg.img_w)
        loss_str = "  |  ".join(f"{k}={v:.4f}" for k, v in losses.items() if k != 'total')
        print(f"  iter {step+1}/{n_iters}  total={losses['total']:.4f}  |  {loss_str}")

    print("\n" + "=" * 65)
    print("✓  All checks passed. GEF is ready for full-scale training.")
    print("=" * 65)
    print("""
Next steps:
  1. Get the official 3DGS codebase (Gaussian primitive densification,
     tile-based CUDA rasterizer — required for full speed):
       git clone https://github.com/graphdeco-inria/gaussian-splatting
       pip install submodules/diff-gaussian-rasterization
       pip install submodules/simple-knn

  2. Download benchmarks:
       DTU:         https://roboimagedata.compute.dtu.dk/?page_id=36
       Tanks & Temples: https://www.tanksandtemples.org/download/
       Mip-NeRF 360: https://jonbarron.info/mipnerf360/

  3. Install mesh extraction dependency:
       pip install open3d

  4. Scale to full config:
       cfg = GEFConfig(num_gaussians=1_000_000, K=50, img_h=540, img_w=960)

  5. Run on RTX 4090 (as in paper):
       Expected training time ~23 min for DTU scenes
       Geometric regularization: iter 15,000+
       Entropy field: iter 20,000+  (k-NN fixed at this point)

  6. Hyperparameter tuning guide (from paper):
       N (entropy update interval): 10-20 for most scenes
       β (temperature): 6.0 default; 4-5 for gradual scenes, 8 for sharp
       K (neighbors): 25-64, adaptive to local density
""")
    return gaussians


if __name__ == "__main__":
    run_smoke_test(device_str="cpu", n_iters=5)

Read the Full Paper

The complete study — including per-scene DTU Chamfer Distance scores, qualitative comparisons on Tanks and Temples, UAV photogrammetry results on ISPRS benchmarks, and the full hyperparameter sensitivity analysis — is published open-access in the ISPRS Journal of Photogrammetry and Remote Sensing.

📄 Read the Paper (Open Access) 💻 3DGS Base Framework

Academic Citation:
Kuang, H., & Liu, J. (2026). Gaussian entropy fields: Driving adaptive sparsity in 3D Gaussian optimization. ISPRS Journal of Photogrammetry and Remote Sensing, 236, 273–285. https://doi.org/10.1016/j.isprsjprs.2026.04.010

This article is an independent editorial analysis of open-access peer-reviewed research. The PyTorch implementation is an educational adaptation designed to illustrate the paper’s concepts. For production deployments, the official 3DGS CUDA rasterizer (diff-gaussian-rasterization) is required for the tile-based rendering pipeline. The paper’s experiments used a single NVIDIA RTX 4090 GPU. Mesh extraction requires open3d. Supported by the National Natural Science Foundation of China (Grant No. 42571528) and the Taishan Scholars Programme (Grant No. 202408183).