GEF: What If the Secret to Better 3D Reconstruction Was Treating Surface Uncertainty as Entropy?
Researchers at Shandong University of Science and Technology stopped asking Gaussian primitives to be geometrically correct and instead asked them to be informationally minimal. Their Gaussian Entropy Fields framework achieves the best Chamfer Distance among all Gaussian Splatting methods on DTU and the top SSIM and LPIPS on Mip-NeRF 360 — without ever explicitly telling a primitive what a surface is.
3D Gaussian Splatting arrived in 2023 and immediately redrew the benchmark charts for real-time novel view synthesis. But it carried a structural flaw: Gaussian primitives optimize toward photometric accuracy, not geometric correctness. Nothing in the base formulation stops thousands of semi-transparent splats from stacking on top of each other along a viewing ray, each contributing a little opacity, none committing to a definite surface. The resulting 3D reconstruction looks fine in rendered images but falls apart when you ask it to produce a mesh. Gaussian Entropy Fields (GEF) from Hong Kuang and Jianchen Liu at Shandong USTA reframes this as an information-theoretic problem: a well-reconstructed surface is one whose primitive distribution has low entropy. You stop telling Gaussians where to go and start penalizing them for being uncertain about where they are.
The Root Cause of 3DGS Surface Failures
To understand why GEF works, you first need to understand exactly what goes wrong in vanilla 3DGS surface reconstruction. During alpha-compositing rendering, a pixel’s color is computed by summing the contributions of all Gaussians sorted by depth: each primitive contributes its color, weighted by its opacity and the cumulative transparency of everything in front of it. The photometric loss gradient can be satisfied by many different arrangements of Gaussians — a single opaque Gaussian on the surface, two semi-transparent Gaussians slightly offset from each other, five low-opacity Gaussians spread across half a meter of depth. All of these can produce nearly identical rendered images. The optimizer has no reason to prefer the compact single-surface arrangement over the diffuse multi-Gaussian blob.
This is the geometric ambiguity problem, and it manifests visually as primitive aliasing: elongated Gaussian splats that stretch along viewing directions, clusters of redundant primitives floating slightly off a surface, and fuzzy meshes when depth maps are fused via TSDF. Prior work attacked this by flattening Gaussians into 2D disks (2DGS), adding normal consistency losses (PGSR), or introducing signed distance field guidance. GEF takes a different angle entirely: instead of specifying what a surface should look like geometrically, it specifies what a primitive distribution should look like informationally.
When Gaussian primitives are well-aligned on a surface, the accumulated opacity along any viewing ray through that surface forms a sharp, concentrated spike — like a Dirac delta. When primitives are stacked redundantly, the accumulated opacity forms a broad, multi-peaked distribution. The Shannon entropy of the normalized opacity distribution is therefore a direct, computationally tractable proxy for surface ambiguity. Minimizing local opacity entropy is equivalent to promoting surface determinism.
GEF Architecture — Three Complementary Components
BASE: 3D Gaussian Splatting pipeline (position μ, covariance Σ, color c, opacity α)
│
├─── Standard photometric loss (L1 + SSIM on rendered vs. ground-truth RGB)
│
┌─────▼──────────────────────────────────────────────────────────────────────────┐
│ COMPONENT 1: ENTROPY-REGULARIZED SPARSITY CONSTRAINT (Section 3.2) │
│ │
│ Opacity Field Along Ray: │
│ α(t) = Σ_i w_i · N(t | μ_i, σ²_i) [mixture of Gaussians on ray] │
│ ã(t) = α(t) / ‖α‖₁ [normalize to pseudo-distribution] │
│ H(α) = -∫ ã(t) log ã(t) dt [differential entropy = ambiguity] │
│ │
│ 3D Neighborhood Formulation (1280× faster than per-ray): │
│ Ωk = K-nearest neighbors of primitive k (K=50, fixed at iter 20,000) │
│ │
│ SNRI (Surface Neighborhood Redundancy Index): │
│ SNRI(i) = (1/K) Σ_j exp(-α[ε - |n_i · (μ_j - μ_i)|]+) │
│ → Low SNRI ≈ 0.1 (primitives spread tangentially — keep them) │
│ → High SNRI ≈ 0.9 (primitives stacked along normal — sparsify them) │
│ │
│ Image Entropy-Guided Weight: │
│ E_img(x,y) = -Σ p(u,v) log₂ p(u,v) [sliding window k×k] │
│ E_img_k = Σ α_k(x,y)·E_img(x,y) / Σ α_k(x,y) [primitive-level entropy] │
│ w_img_k = 1 - (E_img_k - E_min)/(E_max - E_min) [inverse: complex→relax] │
│ │
│ Sparsity Loss: │
│ L_sparsity = Σ_k w_img_k · max(0, H(α_k) - η_k)² │
│ η_k = ½ ln(2πe σ²_min) + ε + β·SNRI_k [adaptive entropy threshold] │
└─────┬──────────────────────────────────────────────────────────────────────────┘
│
┌─────▼──────────────────────────────────────────────────────────────────────────┐
│ COMPONENT 2: MULTI-SCALE ENTROPY ALIGNMENT (Section 3.3) │
│ │
│ Multi-Scale Image Entropy Maps: │
│ Scale l=1: window k_1 = 3×3 (fine: textures, sharp edges) │
│ Scale l=2: window k_2 = 5×5 (medium: intermediate structure) │
│ Scale l=3: window k_3 = 9×9 (coarse: smooth regions, backgrounds) │
│ │
│ Competitive Cross-Scale Softmax Weighting (β=6.0): │
│ W_l(x,y) = exp(β·E^l_norm) / Σ_j exp(β·E^j_norm) │
│ → Each spatial location "won" by its most informative scale │
│ → >85% of pixels have max weight > 0.6 (strong spatial specialization) │
│ │
│ Alignment Loss: │
│ L_align = Σ_l λ_l Σ_i W^l_i · |H^l_i - E^l_i|² │
│ (H = rendered opacity entropy, E = image entropy target) │
│ │
│ Delayed Entropy Updates (efficiency): │
│ θ̇ = -∇_θ L(θ, H_cached) [optimize under fixed entropy] │
│ H_cached updated every N=10-20 iterations [3-4× speedup] │
└─────┬──────────────────────────────────────────────────────────────────────────┘
│
┌─────▼──────────────────────────────────────────────────────────────────────────┐
│ COMPONENT 3: GEOMETRIC REGULARIZATION (Section 3.4) │
│ │
│ Depth Consistency Loss (from 2DGS): │
│ L_depth = Σ_{i,j∈r} ω_i ω_j ‖d_i - d_j‖²₂ │
│ │
│ Normal Consistency Loss: │
│ L_normal = Σ_{i∈r} ω_i (1 - n_i^⊤ n') │
│ (n' = surface normal from finite differences on depth map) │
│ │
│ TOTAL LOSS: │
│ L = L_photometric + λ_s·L_sparsity + λ_a·L_align + L_depth + L_normal │
│ │
│ Training schedule: │
│ iter 0–15,000: photometric only (standard 3DGS) │
│ iter 15,000+: + geometric regularization │
│ iter 20,000+: + entropy field losses (k-NN graph fixed here) │
│ iter 30,000: done → TSDF fusion → Marching Cubes mesh │
└─────────────────────────────────────────────────────────────────────────────────┘
Component 1: Entropy-Regularized Sparsity — The Heart of GEF
From Ray Entropy to Neighborhood Entropy
The paper starts with a beautifully intuitive formulation: normalize the accumulated opacity along a viewing ray into a pseudo-probability distribution, then compute its differential entropy. High entropy means the opacity mass is spread across many depth positions — multiple competing Gaussian clusters, no confident surface. Low entropy means the opacity is concentrated at a single depth — one dominant primitive, a clear surface. Minimizing this entropy over all rays would give you geometrically consistent surfaces as an emergent property of information-theoretic optimization.
The problem is scale. With roughly a million rays per training iteration and 128 samples per ray, direct ray-based entropy computation requires 6.4 billion operations per iteration. That’s not a practical training budget. The key observation that rescues the idea is that ray-based opacity disorder and 3D neighborhood opacity disorder are the same phenomenon viewed from different perspectives. Primitives that stack along viewing directions (creating multi-peaked ray opacity profiles) are exactly the primitives that cluster in the normal direction within 3D spatial neighborhoods. So instead of sampling each ray, GEF partitions the scene into overlapping K-nearest-neighbor neighborhoods around each primitive and computes entropy of the normalized opacity histogram within each neighborhood. Same redundancy, same sparsity signal — but only 5 million operations instead of 6.4 billion, a 1,280× speedup.
SNRI: Not All Stacked Primitives Are Redundant
The naive version of neighborhood entropy would penalize any cluster of primitives, including tangentially-distributed primitives that legitimately cover a curved surface from different angles. GEF introduces the Surface Neighborhood Redundancy Index to distinguish these two cases.
For each primitive \(i\), SNRI computes how far each neighbor \(j\) deviates from the surface plane defined by \(i\)’s normal \(\mathbf{n}_i\). The projected distance \(d_j = |\mathbf{n}_i \cdot ({\mu}_j – {\mu}_i)|\) measures normal-direction displacement. If \(d_j\) is small (neighbor lies in the tangent plane), that neighbor contributes a low value to SNRI — it’s a legitimate surface coverage. If \(d_j\) is large (neighbor is stacked along the normal), it contributes close to 1 — it’s a true redundancy candidate. Averaging over \(K\) neighbors gives a score in \([0,1]\): near 0 for tangentially-spread primitives, near 1 for normal-direction stacks.
SNRI feeds directly into the adaptive entropy threshold \(\eta_k\) — the maximum entropy a neighborhood is allowed to have before the loss kicks in. High-SNRI neighborhoods get a lower threshold (they’re penalized earlier for having any entropy), while low-SNRI neighborhoods get a higher threshold (they’re given more slack because some complexity there is legitimate). The coupling weight \(\beta = -\sigma^2_{\min}\) ensures this scaling is normalized relative to primitive scale.
Image Entropy-Guided Weighting: Letting the Input Images Set the Rules
One fundamental tension in geometric regularization is between smoothness and detail. A flat concrete wall should be aggressively sparsified — you want one clean surface, no noise. A bicycle wheel with thin spokes needs flexibility — too much sparsification will erase the spokes entirely. Prior methods set this tradeoff by hand with global hyperparameters. GEF reads it from the input images themselves.
High image entropy at a pixel means that region is visually complex — lots of edges, fine texture, rich structure. Low image entropy means visual simplicity — smooth gradients, uniform color. After projecting each primitive’s image entropy (a weighted average over its influenced pixels), the inverse normalization in Eq. 9 maps high-entropy regions to low weights in the sparsity loss. The optimizer therefore applies less sparsification pressure where the images indicate complexity, and more where they indicate simplicity. The entire parameter tuning burden — how aggressively to sparsify different scene regions — is absorbed by the image data.
Component 2: Multi-Scale Entropy Alignment — Preventing Over-Smoothing at Boundaries
Even a well-calibrated sparsity loss can introduce excessive smoothing at geometric discontinuities: sharp edges, thin structures, surfaces with rapid curvature changes. GEF’s second component addresses this by enforcing cross-modal entropy distribution matching — the rendered 3D model’s opacity entropy profile should resemble the 2D image entropy profile at corresponding locations, across multiple spatial scales simultaneously.
Three scales of image entropy maps are computed: a fine scale (3×3 window) that captures textures and object boundaries, a medium scale (5×5 window) for intermediate structural features, and a coarse scale (9×9 window) for smooth regions and backgrounds. Rather than blending these uniformly, GEF uses a competitive softmax weighting with temperature parameter \(\beta = 6.0\):
The competitive mechanism means each spatial location gets dominated by one scale — the scale at which that location’s structural information is most concentrated. The paper validates this empirically: over 85% of pixels have a maximum weight above 0.6, confirming strong spatial specialization rather than diffuse blending. Fine scales win in textured regions and along sharp edges, preventing entropy minimization from flattening them. Coarse scales win in smooth regions, ensuring aggressive sparsification there. The temperature \(\beta = 6.0\) is the sweet spot between insufficient competition (scales blur together) and training instability (gradients become too sharp).
A practical efficiency trick makes this viable in training: entropy maps are computationally expensive to recompute every iteration. GEF exploits the fact that configurational entropy evolves much slower than individual parameter values — during a short training window, the geometry barely moves even though the photometric loss drives rapid color changes. Entropy maps are therefore cached and updated only every N=10–20 iterations, achieving a 3–4× training speedup with less than 1% quality degradation.
“Well-reconstructed surfaces can be characterized by low configurational entropy, where dominant primitives clearly define surface geometry while redundant components are suppressed.” — Kuang & Liu, ISPRS Journal of Photogrammetry and Remote Sensing, 2026
Results: What Entropy-Driven Optimization Actually Achieves
Mip-NeRF 360 — Rendering Quality
| Method | Indoor SSIM ↑ | Indoor LPIPS ↓ | Outdoor SSIM ↑ | Outdoor LPIPS ↓ | Avg SSIM ↑ | Avg LPIPS ↓ |
|---|---|---|---|---|---|---|
| 3DGS | 0.926 | 0.199 | 0.705 | 0.283 | 0.803 | 0.246 |
| 2DGS | 0.923 | 0.183 | 0.709 | 0.248 | 0.804 | 0.239 |
| GOF | 0.928 | 0.167 | 0.742 | 0.225 | 0.835 | 0.196 |
| GEF (ours) | 0.944 | 0.084 | 0.783 | 0.177 | 0.855 | 0.136 |
DTU Surface Reconstruction — Chamfer Distance
| Method | Type | Mean CD ↓ | Training Time |
|---|---|---|---|
| NeRF | Implicit | 1.49 | >12 h |
| NeuS | Implicit | 0.84 | >12 h |
| Neuralangelo | Implicit | 0.61 | >12 h |
| 3DGS | Explicit | 1.96 | 5.2 min |
| 2DGS | Explicit | 0.87 | 8.9 min |
| GOF | Explicit | 0.82 | 55 min |
| GEF (ours) | Explicit | 0.64 | 23 min |
Tanks and Temples — F1 Score
| Scene | NeuS | 2DGS | GOF | GEF |
|---|---|---|---|---|
| Barn | 0.29 | 0.45 | 0.51 | 0.48 |
| Ignatius | 0.83 | 0.50 | 0.68 | 0.75 |
| Truck | 0.45 | 0.48 | 0.58 | 0.56 |
| Mean | 0.38 | 0.32 | 0.46 | 0.44 |
| Time | >24 h | 15.5 min | >1 h | 50 min |
A few things stand out in these numbers. On Mip-NeRF 360, GEF is not the best in PSNR (it ranks third at 27.40) — the entropy computation introduces a small amount of noise that slightly hurts pixel-wise accuracy. But it dominates in SSIM and LPIPS by a considerable margin, which means the rendered images are perceptually superior: sharper structural detail, better preservation of fine-grained texture, fewer ghosting artifacts. The distinction between pixel accuracy and perceptual quality is meaningful, and GEF firmly prioritizes the latter.
On DTU Chamfer Distance, GEF achieves 0.64 — best among all Gaussian-Splatting-based methods and competitive with Neuralangelo (0.61), which requires over 12 hours of training. GEF trains in 23 minutes. That’s the efficiency argument for entropy-based sparsification over implicit neural surface methods.
Ablation: Every Component Earns Its Place
| Setting | F1 ↑ | PSNR ↑ | SSIM ↑ | LPIPS ↓ |
|---|---|---|---|---|
| w/o Sparsity Constraint | 0.37 | 23.51 | 0.850 | 0.196 |
| w/o Image Entropy Guidance | 0.40 | 24.62 | 0.865 | 0.170 |
| w/o Multi-Scale Alignment | 0.42 | 25.22 | 0.872 | 0.157 |
| Full GEF model | 0.44 | 25.89 | 0.870 | 0.150 |
The sparsity constraint removal causes the largest F1 drop (0.44 → 0.37, a 15.9% collapse), confirming it is the foundation of geometric quality. Removing image entropy guidance drops F1 by 9.1% — uniform sparsification breaks spatial coherence across complex regions. Multi-scale alignment removal costs 4.5% in F1 — cross-scale entropy matching prevents the smoothing that would otherwise erase fine geometric detail. The gains are synergistic rather than additive, meaning each component addresses a failure mode the others cannot compensate for.
Complete End-to-End GEF Implementation (PyTorch)
The implementation covers all components from the paper in 12 sections: 3D Gaussian primitive representation and differentiable alpha-blending, the SNRI (Surface Neighborhood Redundancy Index) computation, image entropy-guided weighting, the entropy-regularized sparsity loss, multi-scale entropy alignment with competitive softmax weighting, delayed entropy caching for training efficiency, depth and normal consistency losses, the complete GEF loss function, mesh extraction via TSDF fusion and Marching Cubes, the full GEF trainer, hyperparameter sensitivity utilities, and a smoke test.
# ==============================================================================
# GEF: Gaussian Entropy Fields — Driving Adaptive Sparsity in 3D Gaussian Optimization
# Paper: ISPRS Journal of Photogrammetry and Remote Sensing 236 (2026) 273-285
# Authors: Hong Kuang, Jianchen Liu
# Affiliation: Shandong University of Science and Technology, China
# DOI: https://doi.org/10.1016/j.isprsjprs.2026.04.010
# ==============================================================================
# Sections:
# 1. Imports & Configuration
# 2. 3D Gaussian Primitive Representation (Eq. 1-3)
# 3. Differentiable Alpha-Blending Renderer
# 4. SNRI: Surface Neighborhood Redundancy Index (Eq. 6)
# 5. Image Entropy Computation (Eq. 7-9)
# 6. Opacity Entropy in 3D Neighborhoods (Eq. 4-5, 10-11)
# 7. Multi-Scale Entropy Alignment (Eq. 13-16)
# 8. Delayed Entropy Cache (Eq. 15-16)
# 9. Depth and Normal Consistency Losses (Eq. 12, 17)
# 10. Complete GEF Loss Function (Eq. 18-19)
# 11. GEF Trainer + Optimization Schedule
# 12. Mesh Extraction (TSDF + Marching Cubes) & Smoke Test
# ==============================================================================
from __future__ import annotations
import math, warnings
from typing import Dict, List, Optional, Tuple
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch import Tensor
import numpy as np
warnings.filterwarnings("ignore")
# ─── SECTION 1: Configuration ─────────────────────────────────────────────────
class GEFConfig:
"""
GEF hyperparameters.
All defaults match the paper's experimental setup.
"""
# Scene initialization
num_gaussians: int = 100_000 # initial primitive count (grows via densification)
img_h: int = 540
img_w: int = 960
# Neighborhood entropy
K: int = 50 # K-nearest neighbors per primitive (Eq. 10)
sigma_min: float = 0.005 # minimum primitive scale σ_min
eps_snri: float = 0.5 # ε for SNRI threshold (in units of σ_min)
alpha_snri: float = 200.0 # α decay steepness in SNRI (= 1/σ²_min)
eps_entropy: float = 1e-6 # numerical stability for entropy
# Multi-scale entropy alignment
num_scales: int = 3 # L=3: fine(k=3), medium(k=5), coarse(k=9)
beta_temperature: float = 6.0 # β in competitive softmax weighting (Eq. 13)
lambda_scales: List = [1.0, 1.0, 1.0] # λ_l per scale
entropy_update_interval: int = 10 # N: update cached entropy every N iters
# Loss weights
lambda_sparsity: float = 0.1 # λ_sparsity
lambda_align: float = 0.05 # λ_align
lambda_depth: float = 100.0 # depth consistency weight
lambda_normal: float = 0.05 # normal consistency weight
lambda_ssim: float = 0.2 # SSIM weight in photometric loss
# Training schedule (iter counts)
geom_start_iter: int = 15_000 # geometric regularization start
entropy_start_iter: int = 20_000 # entropy field start + k-NN graph fixed
total_iters: int = 30_000
# Mesh extraction
tsdf_voxel_size: float = 0.02 # TSDF voxel size for mesh extraction
def __init__(self, **kwargs):
for k, v in kwargs.items(): setattr(self, k, v)
@property
def epsilon(self):
return self.eps_snri * self.sigma_min
@property
def beta_coupling(self):
return -self.sigma_min ** 2
# ─── SECTION 2: 3D Gaussian Primitive Representation ──────────────────────────
class GaussianPrimitives(nn.Module):
"""
Learnable 3D Gaussian Splatting scene representation (Eq. 1-3).
Each Gaussian primitive is parameterized by:
μ ∈ R³ : center position (world coordinates)
s ∈ R³ : log-scale (exponentiated to get scale vector)
q ∈ R⁴ : quaternion rotation (normalized → rotation matrix R)
c ∈ R³ : RGB color (spherical harmonic degree 0: view-independent)
α ∈ R : pre-sigmoid opacity logit
Covariance: Σ = R·diag(s)·diag(s)ᵀ·Rᵀ (Eq. 1)
2D projection: Σ' = JWΣWᵀJᵀ (Eq. 2)
Alpha-blending: C(p') = Σ c_i ω_i Π(1-ω_j) (Eq. 3)
All parameters are optimized end-to-end via photometric loss.
"""
def __init__(self, n: int, init_scale: float = 0.01):
super().__init__()
self.register_parameter('means', nn.Parameter(torch.randn(n, 3) * 0.5))
self.register_parameter('log_scales', nn.Parameter(torch.full((n, 3), math.log(init_scale))))
self.register_parameter('quats', nn.Parameter(self._init_quats(n)))
self.register_parameter('colors', nn.Parameter(torch.rand(n, 3)))
self.register_parameter('opacity_logits', nn.Parameter(torch.zeros(n)))
@staticmethod
def _init_quats(n: int) -> Tensor:
"""Initialize quaternions to identity rotation."""
q = torch.zeros(n, 4)
q[:, 0] = 1.0 # w=1, x=y=z=0 → identity rotation
return q
@property
def scales(self) -> Tensor:
"""Positive scale vector (exponentiated log-scales)."""
return torch.exp(self.log_scales) # (N, 3), always positive
@property
def opacities(self) -> Tensor:
"""Opacity in [0, 1] via sigmoid."""
return torch.sigmoid(self.opacity_logits) # (N,)
@property
def normals(self) -> Tensor:
"""
Surface normal = principal axis (eigenvector corresponding to
smallest eigenvalue of covariance matrix Σ).
For a diagonal covariance, this is the axis with smallest scale.
Uses quaternion → rotation matrix, then picks the min-scale column.
"""
R = self._quat_to_rot(self.quats) # (N, 3, 3)
min_idx = self.scales.argmin(dim=-1) # (N,) which axis is thinnest
batch_idx = torch.arange(self.means.shape[0], device=self.means.device)
normals = R[batch_idx, :, min_idx] # (N, 3)
return F.normalize(normals, dim=-1)
def covariance_3d(self) -> Tensor:
"""Compute 3D covariance matrices Σ = R·S·Sᵀ·Rᵀ (Eq. 1)."""
R = self._quat_to_rot(self.quats) # (N, 3, 3)
S = torch.diag_embed(self.scales) # (N, 3, 3) diagonal
RS = torch.bmm(R, S) # (N, 3, 3)
return torch.bmm(RS, RS.transpose(-2, -1)) # (N, 3, 3) = RSSᵀRᵀ
@staticmethod
def _quat_to_rot(q: Tensor) -> Tensor:
"""
Convert unit quaternions to 3×3 rotation matrices.
q: (N, 4) as (w, x, y, z), normalized.
"""
q = F.normalize(q, dim=-1)
w, x, y, z = q.unbind(dim=-1)
R = torch.stack([
1 - 2*(y*y + z*z), 2*(x*y - w*z), 2*(x*z + w*y),
2*(x*y + w*z), 1 - 2*(x*x + z*z), 2*(y*z - w*x),
2*(x*z - w*y), 2*(y*z + w*x), 1 - 2*(x*x + y*y),
], dim=-1).reshape(-1, 3, 3)
return R
# ─── SECTION 3: Differentiable Alpha-Blending Renderer ────────────────────────
class SimpleGaussianRenderer(nn.Module):
"""
Simplified differentiable renderer for GEF training (Eq. 2-3).
This implementation uses ray-casting for correctness and clarity.
For production: use the official 3DGS CUDA rasterizer from
https://github.com/graphdeco-inria/gaussian-splatting
Per-pixel color: C(p') = Σ_i c_i ω_i Π_{j
def __init__(self, cfg: GEFConfig):
super().__init__()
self.cfg = cfg
def project_gaussians(
self,
gaussians: GaussianPrimitives,
camera_intrinsics: Dict[str, float],
camera_pose: Tensor, # (4, 4) world-to-camera transform W
) -> Tuple[Tensor, Tensor, Tensor]:
"""
Project 3D Gaussians to 2D image plane (Eq. 2).
Returns: 2D means (N, 2), 2D covariances (N, 2, 2), depths (N,)
"""
N = gaussians.means.shape[0]
fx, fy = camera_intrinsics['fx'], camera_intrinsics['fy']
cx, cy = camera_intrinsics['cx'], camera_intrinsics['cy']
# Transform means to camera space
ones = torch.ones(N, 1, device=gaussians.means.device)
means_h = torch.cat([gaussians.means, ones], dim=-1) # (N, 4)
means_cam = (camera_pose @ means_h.T).T[:, :3] # (N, 3)
depths = means_cam[:, 2].clamp(min=0.01) # (N,)
# Project to image plane
u = fx * means_cam[:, 0] / depths + cx # (N,)
v = fy * means_cam[:, 1] / depths + cy # (N,)
means_2d = torch.stack([u, v], dim=-1) # (N, 2)
# Project 3D covariance to 2D via J·W·Σ·Wᵀ·Jᵀ (Eq. 2)
W_rot = camera_pose[:3, :3] # (3, 3)
Sigma_3d = gaussians.covariance_3d() # (N, 3, 3)
Sigma_cam = W_rot @ Sigma_3d @ W_rot.T # (N, 3, 3) broadcast
# Jacobian of perspective projection
J = torch.zeros(N, 2, 3, device=gaussians.means.device)
J[:, 0, 0] = fx / depths
J[:, 0, 2] = -fx * means_cam[:, 0] / (depths**2)
J[:, 1, 1] = fy / depths
J[:, 1, 2] = -fy * means_cam[:, 1] / (depths**2)
Sigma_2d = torch.bmm(torch.bmm(J, Sigma_cam), J.transpose(-2, -1)) # (N, 2, 2)
# Add small regularization for numerical stability
eye2 = torch.eye(2, device=Sigma_2d.device).unsqueeze(0) * 0.3
Sigma_2d = Sigma_2d + eye2
return means_2d, Sigma_2d, depths
def render(
self,
gaussians: GaussianPrimitives,
camera_intrinsics: Dict,
camera_pose: Tensor,
H: int, W: int,
) -> Dict[str, Tensor]:
"""
Simplified tile-based rendering.
Returns: rgb (H,W,3), depth (H,W), normal (H,W,3), alpha_map (H,W), opacity_entropy (H,W)
For scale: production code should use CUDA rasterizer.
This CPU/GPU portable implementation is for training integration clarity.
"""
means_2d, sigmas_2d, depths = self.project_gaussians(gaussians, camera_intrinsics, camera_pose)
# Visibility filter: keep only Gaussians with depth > 0 and in frustum
visible = (depths > 0.01) & (means_2d[:, 0] > 0) & (means_2d[:, 0] < W) \
& (means_2d[:, 1] > 0) & (means_2d[:, 1] < H)
vis_idx = visible.nonzero(as_tuple=True)[0]
if vis_idx.numel() == 0:
return {'rgb': torch.zeros(H, W, 3), 'depth': torch.zeros(H, W),
'normal': torch.zeros(H, W, 3), 'alpha_map': torch.zeros(H, W),
'opacity_entropy': torch.zeros(H, W)}
# Sort visible Gaussians by depth (front to back for alpha compositing)
vis_depths = depths[vis_idx]
sort_idx = vis_depths.argsort()
idx_sorted = vis_idx[sort_idx]
# Rasterize: compute per-pixel Gaussian contributions
rgb_out = torch.zeros(H * W, 3, device=gaussians.means.device)
depth_out = torch.zeros(H * W, device=gaussians.means.device)
normal_out = torch.zeros(H * W, 3, device=gaussians.means.device)
transmit = torch.ones(H * W, device=gaussians.means.device)
alpha_acc = torch.zeros(H * W, device=gaussians.means.device)
H_alpha_acc = torch.zeros(H * W, device=gaussians.means.device)
# Pixel grid
yy, xx = torch.meshgrid(torch.arange(H, device=gaussians.means.device, dtype=torch.float32),
torch.arange(W, device=gaussians.means.device, dtype=torch.float32),
indexing='ij')
pixels = torch.stack([xx.reshape(-1), yy.reshape(-1)], dim=-1) # (HW, 2)
normals = gaussians.normals # (N, 3)
# Alpha-blending (Eq. 3) — process each Gaussian in depth order
# For production use CUDA tile rasterizer; this loop illustrates the math
for g_idx in idx_sorted[:min(500, len(idx_sorted))]: # cap for smoke test
mu2d = means_2d[g_idx] # (2,)
sig2d = sigmas_2d[g_idx] # (2, 2)
alpha_g = gaussians.opacities[g_idx] # scalar
color_g = gaussians.colors[g_idx] # (3,)
depth_g = depths[g_idx] # scalar
normal_g = normals[g_idx] # (3,)
# Evaluate 2D Gaussian at all pixel positions
diff = pixels - mu2d.unsqueeze(0) # (HW, 2)
try:
sig_inv = torch.linalg.inv(sig2d)
except:
sig_inv = torch.eye(2, device=sig2d.device)
mahal = (diff @ sig_inv * diff).sum(dim=-1) # (HW,) Mahalanobis²
gauss_val = torch.exp(-0.5 * mahal.clamp(max=20)) # (HW,)
omega = alpha_g * gauss_val # (HW,) contribution
omega_weighted = transmit * omega # (HW,) visible contribution
rgb_out += omega_weighted.unsqueeze(-1) * color_g.unsqueeze(0)
depth_out += omega_weighted * depth_g
normal_out += omega_weighted.unsqueeze(-1) * normal_g.unsqueeze(0)
alpha_acc += omega_weighted
# Per-pixel opacity entropy accumulation (for entropy alignment)
p_safe = omega_weighted.clamp(min=1e-8)
H_alpha_acc -= omega_weighted * torch.log(p_safe) # -ω log ω
transmit = transmit * (1.0 - omega) # update transmittance
return {
'rgb': rgb_out.reshape(H, W, 3),
'depth': depth_out.reshape(H, W),
'normal': F.normalize(normal_out, dim=-1).reshape(H, W, 3),
'alpha_map': alpha_acc.reshape(H, W),
'opacity_entropy': H_alpha_acc.reshape(H, W),
}
# ─── SECTION 4: SNRI — Surface Neighborhood Redundancy Index ──────────────────
def compute_snri(
means: Tensor, # (N, 3) primitive centers
normals: Tensor, # (N, 3) primitive surface normals
knn_idx: Tensor, # (N, K) pre-computed k-nearest-neighbor indices
alpha_decay: float,
epsilon: float,
) -> Tensor:
"""
Surface Neighborhood Redundancy Index (Eq. 6).
SNRI(i) = (1/K) Σ_j exp(-α · [ε - |n_i · (μ_j - μ_i)|]₊)
Geometric interpretation:
d_j = |n_i · (μ_j - μ_i)| measures how far neighbor j is from
the surface tangent plane of primitive i.
d_j small → j is tangential (same surface) → contribution is LOW
d_j large → j is stacked along normal → contribution is HIGH (≈1)
Output: SNRI ∈ [0,1] per primitive
→ Low SNRI: tangentially distributed (legitimate surface coverage)
→ High SNRI: normal-direction stacking (true geometric redundancy)
This gating prevents over-sparsification of meaningful surface detail.
"""
N, K = knn_idx.shape
mu_neighbors = means[knn_idx.reshape(-1)].reshape(N, K, 3) # (N, K, 3)
delta = mu_neighbors - means.unsqueeze(1) # (N, K, 3)
# Projected distance along surface normal of primitive i
d_j = (normals.unsqueeze(1) * delta).sum(dim=-1).abs() # (N, K) |n_i · Δμ|
# Per-neighbor contribution: exp(-α · [ε - d_j]₊)
hinge = F.relu(epsilon - d_j) # (N, K), zero when d_j > ε
contribution = torch.exp(-alpha_decay * hinge) # (N, K) ∈ [0, 1]
snri = contribution.mean(dim=-1) # (N,)
return snri
def build_knn_graph(means: Tensor, K: int) -> Tensor:
"""
Build K-nearest-neighbor graph over Gaussian centers.
Fixed at iteration 20,000 after primitives stabilize near surfaces.
One-time cost, amortized over remaining 10,000 iterations.
Returns: knn_idx (N, K) — indices of K nearest neighbors per primitive.
"""
N = means.shape[0]
# Pairwise distance matrix (efficient via |a-b|² = |a|² - 2aᵀb + |b|²)
dist2 = (means.unsqueeze(0) - means.unsqueeze(1)).pow(2).sum(dim=-1) # (N, N)
# Exclude self (set diagonal to infinity)
dist2.fill_diagonal_(float('inf'))
# K nearest by distance
_, knn_idx = dist2.topk(K, dim=-1, largest=False) # (N, K)
return knn_idx.detach() # detach: topology is fixed, not optimized
# ─── SECTION 5: Image Entropy Computation ─────────────────────────────────────
def compute_image_entropy(image: Tensor, window_size: int) -> Tensor:
"""
Local image entropy via sliding-window analysis (Eq. 7).
E_img(x,y) = -Σ_{(u,v)∈Ω_{k×k}(x,y)} p(u,v) log₂ p(u,v)
where p(u,v) is the normalized intensity histogram probability in
the k×k window centered at (x,y).
Implementation: unfold the image into patches, compute histogram
entropy per patch, fold back to image shape.
image: (H, W) grayscale or (H, W, C) → averaged to grayscale
Returns: (H, W) entropy map
"""
if image.dim() == 3:
image = image.mean(dim=-1) # (H, W) grayscale
H, W = image.shape
pad = window_size // 2
image_padded = F.pad(image.unsqueeze(0).unsqueeze(0), [pad]*4, mode='reflect')[0, 0]
# Unfold into overlapping patches
patches = image_padded.unfold(0, window_size, 1).unfold(1, window_size, 1)
patches = patches.reshape(H, W, -1) # (H, W, k*k)
# Discretize to 32 bins for histogram entropy
num_bins = 32
patches_q = (patches * (num_bins - 1)).long().clamp(0, num_bins - 1)
entropy_map = torch.zeros(H, W, device=image.device)
for b in range(num_bins):
p = (patches_q == b).float().mean(dim=-1) # (H, W) histogram prob at bin b
safe_p = p.clamp(min=1e-10)
entropy_map -= p * torch.log2(safe_p) # -p log₂ p (only nonzero where p>0)
return entropy_map
def compute_primitive_image_entropy(
alpha_map: Tensor, # (H, W) rendered opacity map
image_entropy: Tensor, # (H, W) image entropy map
means_2d: Tensor, # (N, 2) projected Gaussian centers
H: int, W: int,
) -> Tensor:
"""
Per-primitive image entropy (Eq. 8): weighted average over influenced pixels.
E_img_k = Σ_{x,y} α_k(x,y) · E_img(x,y) / Σ_{x,y} α_k(x,y)
For efficiency, approximates each primitive's pixel influence as a small
window around its projected center (avoids full alpha map per primitive).
"""
N = means_2d.shape[0]
primitive_entropy = torch.zeros(N, device=alpha_map.device)
r = 5 # local window radius around projected center
for k in range(N):
cx, cy = means_2d[k, 0].long().clamp(0, W-1), means_2d[k, 1].long().clamp(0, H-1)
x0, x1 = (cx - r).clamp(0, W-1), (cx + r).clamp(0, W-1)
y0, y1 = (cy - r).clamp(0, H-1), (cy + r).clamp(0, H-1)
alpha_window = alpha_map[y0:y1+1, x0:x1+1]
entropy_window = image_entropy[y0:y1+1, x0:x1+1]
denom = alpha_window.sum() + 1e-8
primitive_entropy[k] = (alpha_window * entropy_window).sum() / denom
return primitive_entropy # (N,)
# ─── SECTION 6: Opacity Entropy in 3D Neighborhoods ──────────────────────────
def compute_neighborhood_opacity_entropy(
opacities: Tensor, # (N,) primitive opacities
knn_idx: Tensor, # (N, K) neighbor indices
eps: float = 1e-8,
) -> Tensor:
"""
Neighborhood-based entropy (Eq. 10 inner term).
For each primitive k, treats the opacities of its K neighbors as
a probability distribution and computes Shannon entropy.
This is the 1280× faster approximation to ray-based opacity entropy
(Eq. 5): the same redundancy manifests whether you look along a ray
or within a spatial neighborhood of the redundant primitives.
Returns: H_alpha (N,) neighborhood entropy per primitive.
"""
N, K = knn_idx.shape
# Gather neighbor opacities
neighbor_opacities = opacities[knn_idx.reshape(-1)].reshape(N, K) # (N, K)
# Normalize to pseudo-probability distribution per neighborhood
total = neighbor_opacities.sum(dim=-1, keepdim=True) + eps
alpha_tilde = neighbor_opacities / total # (N, K)
# Shannon entropy: H(α) = -Σ ã log ã
safe_p = alpha_tilde.clamp(min=eps)
H_alpha = -(alpha_tilde * safe_p.log()).sum(dim=-1) # (N,)
return H_alpha
def compute_adaptive_threshold(
snri: Tensor, # (N,)
sigma_min: float,
epsilon_stab: float,
beta_coupling: float,
) -> Tensor:
"""
Adaptive entropy threshold η_k (Eq. 11).
η_k = ½ ln(2πe σ²_min) + ε + β·SNRI_k
The base threshold ½ ln(2πe σ²_min) is the differential entropy of a
Gaussian with variance σ²_min — the minimum entropy for a well-defined
primitive. β = -σ²_min lowers the threshold for high-SNRI primitives
(normal-stacked → penalized earlier) and raises it for low-SNRI ones
(tangential → given more flexibility).
"""
base = 0.5 * math.log(2 * math.pi * math.e * sigma_min**2)
eta = base + epsilon_stab + beta_coupling * snri # (N,)
return eta
class EntropySparsityLoss(nn.Module):
"""
Entropy-regularized sparsity constraint (Eq. 10-11, Section 3.2).
L_sparsity = Σ_k w_img_k · max(0, H(α_k) - η_k)²
Only penalizes primitives whose neighborhood entropy EXCEEDS the
adaptive threshold η_k. The hinge ensures no gradient flows for
well-behaved, already-sparse neighborhoods.
The image entropy weights w_img_k provide spatial adaptivity:
- Low weight at complex image regions → relax sparsification pressure
- High weight at simple image regions → aggressive sparsification
"""
def __init__(self, cfg: GEFConfig):
super().__init__()
self.cfg = cfg
def forward(
self,
gaussians: GaussianPrimitives,
knn_idx: Tensor,
primitive_img_entropy: Tensor, # (N,) per-primitive image entropy
) -> Tensor:
"""Returns scalar sparsity loss."""
cfg = self.cfg
N = gaussians.means.shape[0]
# SNRI: identify truly redundant (normal-stacked) primitives
snri = compute_snri(
gaussians.means, gaussians.normals, knn_idx,
cfg.alpha_snri, cfg.epsilon
) # (N,)
# Neighborhood opacity entropy
H_alpha = compute_neighborhood_opacity_entropy(
gaussians.opacities, knn_idx, eps=cfg.eps_entropy
) # (N,)
# Adaptive entropy threshold η_k
eta = compute_adaptive_threshold(
snri, cfg.sigma_min, cfg.eps_entropy, cfg.beta_coupling
) # (N,)
# Image entropy-guided weight (Eq. 9): inverse normalization
E_min = primitive_img_entropy.min()
E_max = primitive_img_entropy.max() + cfg.eps_entropy
w_img = 1.0 - (primitive_img_entropy - E_min) / (E_max - E_min) # (N,) ∈ [0,1]
# Exclude tangential primitives (low SNRI) from sparsification gate
# (they provide legitimate surface coverage, not redundancy)
tangential_mask = snri < 0.3 # primitives spreading along surface → preserve
w_img = w_img * (~tangential_mask).float()
# Hinge loss: penalize entropy exceeding threshold (Eq. 10)
excess = F.relu(H_alpha - eta) # (N,) only positive violations
loss = (w_img * excess.pow(2)).sum() # weighted squared hinge
return loss
# ─── SECTION 7: Multi-Scale Entropy Alignment ─────────────────────────────────
class MultiScaleEntropyAlignment(nn.Module):
"""
Multi-scale entropy alignment with competitive cross-scale weighting (Eq. 13-14, Section 3.3).
Prevents excessive smoothing at geometric discontinuities by enforcing
that rendered opacity entropy matches image entropy at corresponding
locations across multiple spatial scales.
Three scales:
l=1: k=3×3 fine (textures, sharp edges) → wins in high-detail regions
l=2: k=5×5 medium (intermediate structure) → wins in moderate regions
l=3: k=9×9 coarse (smooth, homogeneous areas) → wins in flat regions
Competitive softmax ensures spatial exclusivity:
>85% of pixels dominated by one scale (concentration metric > 0.6).
"""
def __init__(self, cfg: GEFConfig):
super().__init__()
self.cfg = cfg
self.window_sizes = [2*l + 1 for l in range(1, cfg.num_scales + 1)] # [3, 5, 7]
def compute_multi_scale_image_entropy(self, image: Tensor) -> List[Tensor]:
"""Compute image entropy maps at each scale."""
entropy_maps = []
for k in self.window_sizes:
E = compute_image_entropy(image, window_size=k) # (H, W)
entropy_maps.append(E)
return entropy_maps # L × (H, W)
def competitive_weights(self, entropy_maps: List[Tensor]) -> List[Tensor]:
"""
Compute competitive softmax weights across scales (Eq. 13).
W_l(x,y) = exp(β·E^l_norm) / Σ_j exp(β·E^j_norm)
E^l_norm: percentile-normalized entropy at scale l (per-scale,
to avoid global normalization artifacts from different magnitudes
at different window sizes).
"""
beta = self.cfg.beta_temperature
L = len(entropy_maps)
H, W = entropy_maps[0].shape
# Percentile-normalize each scale independently
normed = []
for E in entropy_maps:
p5 = torch.quantile(E, 0.05)
p95 = torch.quantile(E, 0.95)
E_n = (E - p5) / (p95 - p5 + 1e-8)
normed.append(E_n.clamp(0, 1))
# Stack and apply competitive softmax (Eq. 13)
stacked = torch.stack(normed, dim=0) # (L, H, W)
weights = F.softmax(beta * stacked, dim=0) # (L, H, W), sums to 1 across scales
return [weights[l] for l in range(L)] # L × (H, W)
def forward(
self,
rendered_entropy: Tensor, # (H, W) rendered opacity entropy map
image: Tensor, # (H, W) or (H, W, 3) ground-truth image
) -> Tensor:
"""
Multi-scale alignment loss (Eq. 14).
L_align = Σ_l λ_l · Σ_i W^l_i · |H^l_i - E^l_i|²
H^l_i: rendered opacity entropy at scale l, pixel i
(approximated by blurring the pixel-wise rendered entropy)
E^l_i: image entropy at scale l, pixel i
W^l_i: competitive scale weight at scale l, pixel i
"""
H, W = rendered_entropy.shape
cfg = self.cfg
# Multi-scale image entropy maps
target_entropies = self.compute_multi_scale_image_entropy(image) # L × (H, W)
# Competitive weights
weights = self.competitive_weights(target_entropies) # L × (H, W)
total_loss = torch.tensor(0.0, device=rendered_entropy.device)
for l, (target_E, weight_W, lam) in enumerate(
zip(target_entropies, weights, cfg.lambda_scales)):
# Multi-scale rendered entropy: blur the rendered entropy map at scale l
k = self.window_sizes[l]
pad = k // 2
rendered_blurred = F.avg_pool2d(
rendered_entropy.unsqueeze(0).unsqueeze(0),
kernel_size=k, stride=1, padding=pad
)[0, 0] # (H, W)
# Weighted MSE between rendered and target entropy at this scale
diff = (rendered_blurred - target_E).pow(2) # (H, W)
total_loss = total_loss + lam * (weight_W * diff).sum()
return total_loss
# ─── SECTION 8: Delayed Entropy Cache ─────────────────────────────────────────
class EntropyCache:
"""
Delayed entropy update strategy (Eq. 15-16, Section 3.3).
Configurational entropy distributions evolve much slower than
individual Gaussian parameters during optimization. This temporal
scale separation justifies a computationally efficient strategy:
θ̇ = -∇_θ L(θ, H_cached) [optimize under fixed entropy maps]
H_cached(t+Δt) = H_render(θ(t+Δt)) [update every N iterations]
Achieves 3-4× speedup with <1% quality degradation when N=10-20.
"""
def __init__(self, update_interval: int = 10):
self.N = update_interval
self._cached_entropy: Optional[Tensor] = None
self._last_update: int = -1
def update_if_needed(self, iteration: int, new_entropy: Tensor) -> Tensor:
"""Return cached entropy, refreshing every N iterations."""
if (self._cached_entropy is None or
(iteration - self._last_update) >= self.N):
self._cached_entropy = new_entropy.detach() # detach from gradient graph
self._last_update = iteration
return self._cached_entropy
@property
def has_cache(self) -> bool:
return self._cached_entropy is not None
# ─── SECTION 9: Depth and Normal Consistency Losses ───────────────────────────
def depth_consistency_loss(
depth_map: Tensor, # (H, W) rendered depth
alpha_map: Tensor, # (H, W) rendered opacity (blending weights)
) -> Tensor:
"""
Depth consistency loss from 2DGS (Eq. 12).
L_depth = Σ_{i,j∈r} ω_i ω_j ‖d_i - d_j‖²₂
Penalizes variance in depth estimates within local ray neighborhoods,
promoting convergence toward sharp, consistent depth estimates.
Approximated here over local pixel neighborhoods (3×3 windows)
weighted by rendered opacity to match the ray-based formulation.
"""
depth = depth_map.unsqueeze(0).unsqueeze(0) # (1, 1, H, W)
alpha = alpha_map.unsqueeze(0).unsqueeze(0) # (1, 1, H, W)
# Local depth mean via average pooling (neighborhood estimate of depth)
depth_mean = F.avg_pool2d(depth, kernel_size=3, stride=1, padding=1)
depth_var = F.avg_pool2d(depth**2, kernel_size=3, stride=1, padding=1) - depth_mean**2
depth_var = depth_var.clamp(min=0)
# Weight by opacity: inconsistency in low-alpha regions matters less
alpha_weight = alpha.detach()
loss = (alpha_weight * depth_var).mean()
return loss
def normal_consistency_loss(normal_map: Tensor, depth_map: Tensor) -> Tensor:
"""
Normal consistency loss (Eq. 17).
L_normal = Σ_{i∈r} ω_i (1 - n_i^⊤ n')
n' = surface normal from finite differences on depth map
n_i = Gaussian primitive's normal (principal axis of covariance)
The term (1 - n_i^⊤ n') measures angular difference between the
predicted Gaussian normal and the depth-derived surface orientation.
Encourages Gaussian primitives to align tangentially to the surface.
"""
# Finite difference normals from depth map
# n' = normalize([-dZ/dx, -dZ/dy, 1])
dz_dx = depth_map[:, 1:] - depth_map[:, :-1] # (H, W-1)
dz_dy = depth_map[1:, :] - depth_map[:-1, :] # (H-1, W)
# Pad to restore original size
dz_dx = F.pad(dz_dx, [0, 1]) # (H, W)
dz_dy = F.pad(dz_dy, [0, 0, 0, 1]) # (H, W)
normal_fd = torch.stack([-dz_dx, -dz_dy, torch.ones_like(dz_dx)], dim=-1)
normal_fd = F.normalize(normal_fd, dim=-1) # (H, W, 3)
# Angular difference: 1 - n^⊤ n' (zero when perfectly aligned)
cos_sim = (normal_map * normal_fd).sum(dim=-1) # (H, W)
loss = (1.0 - cos_sim).mean()
return loss
# ─── SECTION 10: Complete GEF Loss Function ───────────────────────────────────
class GEFLoss(nn.Module):
"""
Complete GEF training loss (Eq. 18-19, Section 3.4).
L = L_photometric + L_entropy + L_geom
L_photometric = (1-λ_ssim)·L1 + λ_ssim·(1-SSIM) [standard 3DGS loss]
L_entropy = λ_sparsity·L_sparsity + λ_align·L_align [entropy field losses]
L_geom = L_depth + λ_normal·L_normal [geometric consistency]
Training schedule (critical for stable convergence):
iter 0-15k: photometric only (warm up geometry)
iter 15k+: + geometric losses (align normals with surface)
iter 20k+: + entropy losses (sparsify redundant primitives)
The delayed entropy activation ensures primitives are already near surfaces
before entropy-driven sparsification begins, preventing premature pruning.
"""
def __init__(self, cfg: GEFConfig):
super().__init__()
self.cfg = cfg
self.sparsity_loss = EntropySparsityLoss(cfg)
self.align_loss = MultiScaleEntropyAlignment(cfg)
@staticmethod
def ssim_loss(rendered: Tensor, gt: Tensor, window_size: int = 11) -> Tensor:
"""Simplified SSIM-based perceptual loss."""
C1, C2 = 0.01**2, 0.03**2
mu_r = F.avg_pool2d(rendered, window_size, stride=1, padding=window_size//2)
mu_g = F.avg_pool2d(gt, window_size, stride=1, padding=window_size//2)
var_r = F.avg_pool2d(rendered**2, window_size, stride=1, padding=window_size//2) - mu_r**2
var_g = F.avg_pool2d(gt**2, window_size, stride=1, padding=window_size//2) - mu_g**2
cov = F.avg_pool2d(rendered * gt, window_size, stride=1, padding=window_size//2) - mu_r * mu_g
ssim = ((2 * mu_r * mu_g + C1) * (2 * cov + C2)) / \
((mu_r**2 + mu_g**2 + C1) * (var_r + var_g + C2))
return 1.0 - ssim.mean()
def photometric_loss(self, rendered_rgb: Tensor, gt_rgb: Tensor) -> Tensor:
"""Standard 3DGS photometric loss: L1 + SSIM."""
l1 = F.l1_loss(rendered_rgb, gt_rgb)
ssim = self.ssim_loss(
rendered_rgb.permute(2, 0, 1).unsqueeze(0),
gt_rgb.permute(2, 0, 1).unsqueeze(0),
)
return (1.0 - self.cfg.lambda_ssim) * l1 + self.cfg.lambda_ssim * ssim
def forward(
self,
rendered: Dict[str, Tensor], # renderer output dict
gt_rgb: Tensor, # (H, W, 3) ground-truth image
gaussians: GaussianPrimitives,
knn_idx: Optional[Tensor],
primitive_img_entropy: Optional[Tensor],
entropy_cache: Optional[EntropyCache],
iteration: int,
) -> Dict[str, Tensor]:
"""
Compute all GEF losses with training schedule gating.
Returns dict with 'total', 'photometric', 'sparsity', 'align',
'depth', 'normal' keys (individual losses for logging).
"""
cfg = self.cfg
losses = {}
# ── L_photometric (always active) ────────────────────────────────────
l_photo = self.photometric_loss(rendered['rgb'], gt_rgb)
losses['photometric'] = l_photo
total = l_photo
# ── L_geom (active from iter 15,000) ─────────────────────────────────
l_depth, l_normal = torch.tensor(0.0), torch.tensor(0.0)
if iteration >= cfg.geom_start_iter:
l_depth = depth_consistency_loss(rendered['depth'], rendered['alpha_map'])
l_normal = normal_consistency_loss(rendered['normal'], rendered['depth'])
total = total + cfg.lambda_depth * l_depth + cfg.lambda_normal * l_normal
losses['depth'] = l_depth
losses['normal'] = l_normal
# ── L_entropy (active from iter 20,000, requires k-NN graph) ─────────
l_sparse, l_align = torch.tensor(0.0), torch.tensor(0.0)
if iteration >= cfg.entropy_start_iter and knn_idx is not None:
# Entropy sparsity: penalize high-entropy neighborhoods
if primitive_img_entropy is not None:
l_sparse = self.sparsity_loss(gaussians, knn_idx, primitive_img_entropy)
# Entropy alignment: match rendered entropy distribution to image entropy
cached_entropy = rendered['opacity_entropy']
if entropy_cache is not None:
cached_entropy = entropy_cache.update_if_needed(iteration, cached_entropy)
l_align = self.align_loss(cached_entropy, gt_rgb)
total = total + cfg.lambda_sparsity * l_sparse + cfg.lambda_align * l_align
losses['sparsity'] = l_sparse
losses['align'] = l_align
losses['total'] = total
return losses
# ─── SECTION 11: GEF Trainer ──────────────────────────────────────────────────
class GEFTrainer:
"""
GEF training loop with full optimization schedule.
Mirrors the paper's implementation on NVIDIA RTX 4090:
- Geometric regularization starts at iter 15,000
- Entropy field regularization starts at iter 20,000
- k-NN graph built once at iter 20,000 (primitives near surfaces)
- Entropy maps cached and updated every N=10-20 iterations
- Total 30,000 iterations
For full 3DGS densification (splitting/pruning based on opacity and
gradient magnitude): see the official 3DGS repo implementation which
manages adaptive control of Gaussian count.
"""
def __init__(self, gaussians: GaussianPrimitives, cfg: GEFConfig):
self.gaussians = gaussians
self.cfg = cfg
self.loss_fn = GEFLoss(cfg)
self.renderer = SimpleGaussianRenderer(cfg)
self.entropy_cache = EntropyCache(cfg.entropy_update_interval)
self.knn_idx: Optional[Tensor] = None
self.optimizer = torch.optim.Adam([
{'params': gaussians.means, 'lr': 1.6e-4},
{'params': gaussians.log_scales, 'lr': 5e-3},
{'params': gaussians.quats, 'lr': 1e-3},
{'params': gaussians.colors, 'lr': 2.5e-3},
{'params': gaussians.opacity_logits, 'lr': 5e-2},
])
def _build_knn_if_needed(self, iteration: int) -> None:
"""Build k-NN graph at iter 20,000 when primitives stabilize near surfaces."""
if iteration == self.cfg.entropy_start_iter and self.knn_idx is None:
print(f" [iter {iteration}] Building k-NN graph (K={self.cfg.K})...")
self.knn_idx = build_knn_graph(self.gaussians.means.detach(), self.cfg.K)
print(f" k-NN graph built. Shape: {self.knn_idx.shape}")
def train_step(
self,
iteration: int,
camera_intrinsics: Dict,
camera_pose: Tensor,
gt_rgb: Tensor, # (H, W, 3) ground-truth image
H: int, W: int,
) -> Dict[str, float]:
"""Single optimization iteration."""
self._build_knn_if_needed(iteration)
self.optimizer.zero_grad()
# Render scene
rendered = self.renderer.render(self.gaussians, camera_intrinsics, camera_pose, H, W)
# Compute per-primitive image entropy (only when entropy losses are active)
primitive_img_entropy = None
if iteration >= self.cfg.entropy_start_iter and self.knn_idx is not None:
img_entropy = compute_image_entropy(gt_rgb, window_size=7)
means_2d, _, _ = self.renderer.project_gaussians(
self.gaussians, camera_intrinsics, camera_pose)
primitive_img_entropy = compute_primitive_image_entropy(
rendered['alpha_map'].detach(), img_entropy,
means_2d.detach(), H, W
)
# Compute losses
losses = self.loss_fn(
rendered, gt_rgb, self.gaussians, self.knn_idx,
primitive_img_entropy, self.entropy_cache, iteration
)
# Backpropagate and optimize
losses['total'].backward()
torch.nn.utils.clip_grad_norm_(self.gaussians.parameters(), 1.0)
self.optimizer.step()
return {k: v.item() if isinstance(v, Tensor) else v
for k, v in losses.items()}
# ─── SECTION 12: Mesh Extraction & Smoke Test ─────────────────────────────────
class TSDFMeshExtractor:
"""
Mesh extraction via TSDF fusion + Marching Cubes (Section 4, Impl. Details).
Standard GEF mesh pipeline (mirrors paper implementation):
1. Render depth maps for all training views
2. Fuse depth maps into TSDF volume (Curless & Levoy, 1996)
3. Extract mesh via Marching Cubes (Lorensen & Cline, 1987)
For production: use open3d.pipelines.integration.ScalableTSDFVolume
which handles large scenes more efficiently.
Returns a triangle mesh as numpy arrays (vertices, faces).
"""
def __init__(self, voxel_size: float = 0.02, trunc_dist_factor: float = 3.0):
self.voxel_size = voxel_size
self.trunc_dist = voxel_size * trunc_dist_factor
def fuse_depth_maps(
self,
depth_maps: List[Tensor], # list of (H, W) depth maps
poses: List[Tensor], # list of (4, 4) camera-to-world poses
intrinsics: Dict[str, float],
volume_bounds: Tuple, # (min_xyz, max_xyz) tuples
) -> Tuple[np.ndarray, np.ndarray]:
"""
Simplified TSDF fusion and Marching Cubes.
Full implementation requires open3d or custom CUDA TSDF.
This illustrates the algorithmic pipeline:
1. Allocate TSDF voxel volume
2. For each depth map, update TSDF values along view frustum
3. Extract isosurface at TSDF=0 via Marching Cubes
"""
try:
import open3d as o3d
volume = o3d.pipelines.integration.ScalableTSDFVolume(
voxel_length=self.voxel_size,
sdf_trunc=self.trunc_dist,
color_type=o3d.pipelines.integration.TSDFVolumeColorType.NoColor
)
fx = intrinsics['fx']; fy = intrinsics['fy']
cx = intrinsics['cx']; cy = intrinsics['cy']
H, W = depth_maps[0].shape
cam_intrinsic = o3d.camera.PinholeCameraIntrinsic(W, H, fx, fy, cx, cy)
for depth, pose in zip(depth_maps, poses):
depth_np = depth.cpu().numpy().astype(np.float32)
depth_o3d = o3d.geometry.Image(depth_np)
dummy_rgb = o3d.geometry.Image(np.ones((H, W, 3), dtype=np.uint8) * 128)
rgbd = o3d.geometry.RGBDImage.create_from_color_and_depth(
dummy_rgb, depth_o3d, depth_scale=1.0, depth_trunc=5.0)
extrinsic = np.linalg.inv(pose.cpu().numpy())
volume.integrate(rgbd, cam_intrinsic, extrinsic)
mesh = volume.extract_triangle_mesh()
mesh.compute_vertex_normals()
vertices = np.asarray(mesh.vertices)
faces = np.asarray(mesh.triangles)
return vertices, faces
except ImportError:
print("[TSDFExtractor] open3d not available. Install: pip install open3d")
# Return placeholder mesh
vertices = np.zeros((4, 3), dtype=np.float32)
faces = np.array([[0,1,2],[0,2,3]], dtype=np.int32)
return vertices, faces
def run_smoke_test(device_str: str = "cpu", n_iters: int = 5):
device = torch.device(device_str)
print("=" * 65)
print(" GEF — Gaussian Entropy Fields — Full Architecture Smoke Test")
print("=" * 65)
torch.manual_seed(42)
# Tiny config for fast CPU test
cfg = GEFConfig(num_gaussians=200, img_h=32, img_w=32, K=10,
geom_start_iter=2, entropy_start_iter=3, total_iters=5)
# [1] Initialize Gaussian primitives
print("\n[1/5] Initializing Gaussian primitives...")
gaussians = GaussianPrimitives(cfg.num_gaussians, init_scale=0.02).to(device)
n_params = sum(p.numel() for p in gaussians.parameters())
print(f" Primitives: {cfg.num_gaussians} | Parameters: {n_params:,}")
# [2] Test SNRI computation
print("\n[2/5] Testing SNRI computation...")
knn_idx = build_knn_graph(gaussians.means.detach(), cfg.K)
snri = compute_snri(gaussians.means, gaussians.normals, knn_idx,
cfg.alpha_snri, cfg.epsilon)
print(f" SNRI shape: {tuple(snri.shape)} min={snri.min():.3f} max={snri.max():.3f}")
assert snri.shape == (cfg.num_gaussians,)
assert 0.0 <= snri.min().item() and snri.max().item() <= 1.0
# [3] Test image entropy
print("\n[3/5] Testing image entropy computation...")
dummy_image = torch.rand(cfg.img_h, cfg.img_w, 3, device=device)
entropy_map = compute_image_entropy(dummy_image, window_size=3)
print(f" Image entropy shape: {tuple(entropy_map.shape)} range=[{entropy_map.min():.3f}, {entropy_map.max():.3f}]")
# [4] Test multi-scale alignment
print("\n[4/5] Testing multi-scale entropy alignment...")
align_module = MultiScaleEntropyAlignment(cfg)
rendered_entropy = torch.rand(cfg.img_h, cfg.img_w, device=device)
l_align = align_module(rendered_entropy, dummy_image)
print(f" Multi-scale alignment loss: {l_align.item():.4f}")
# [5] Full training loop
print(f"\n[5/5] Full training loop ({n_iters} steps with tiny config)...")
trainer = GEFTrainer(gaussians, cfg)
camera_intrinsics = {'fx': 20.0, 'fy': 20.0, 'cx': 16.0, 'cy': 16.0}
camera_pose = torch.eye(4, device=device)
camera_pose[2, 3] = 2.0 # camera 2m in front of origin
gt_rgb = torch.rand(cfg.img_h, cfg.img_w, 3, device=device)
for step in range(n_iters):
losses = trainer.train_step(step, camera_intrinsics, camera_pose, gt_rgb,
cfg.img_h, cfg.img_w)
loss_str = " | ".join(f"{k}={v:.4f}" for k, v in losses.items() if k != 'total')
print(f" iter {step+1}/{n_iters} total={losses['total']:.4f} | {loss_str}")
print("\n" + "=" * 65)
print("✓ All checks passed. GEF is ready for full-scale training.")
print("=" * 65)
print("""
Next steps:
1. Get the official 3DGS codebase (Gaussian primitive densification,
tile-based CUDA rasterizer — required for full speed):
git clone https://github.com/graphdeco-inria/gaussian-splatting
pip install submodules/diff-gaussian-rasterization
pip install submodules/simple-knn
2. Download benchmarks:
DTU: https://roboimagedata.compute.dtu.dk/?page_id=36
Tanks & Temples: https://www.tanksandtemples.org/download/
Mip-NeRF 360: https://jonbarron.info/mipnerf360/
3. Install mesh extraction dependency:
pip install open3d
4. Scale to full config:
cfg = GEFConfig(num_gaussians=1_000_000, K=50, img_h=540, img_w=960)
5. Run on RTX 4090 (as in paper):
Expected training time ~23 min for DTU scenes
Geometric regularization: iter 15,000+
Entropy field: iter 20,000+ (k-NN fixed at this point)
6. Hyperparameter tuning guide (from paper):
N (entropy update interval): 10-20 for most scenes
β (temperature): 6.0 default; 4-5 for gradual scenes, 8 for sharp
K (neighbors): 25-64, adaptive to local density
""")
return gaussians
if __name__ == "__main__":
run_smoke_test(device_str="cpu", n_iters=5)
Read the Full Paper
The complete study — including per-scene DTU Chamfer Distance scores, qualitative comparisons on Tanks and Temples, UAV photogrammetry results on ISPRS benchmarks, and the full hyperparameter sensitivity analysis — is published open-access in the ISPRS Journal of Photogrammetry and Remote Sensing.
Kuang, H., & Liu, J. (2026). Gaussian entropy fields: Driving adaptive sparsity in 3D Gaussian optimization. ISPRS Journal of Photogrammetry and Remote Sensing, 236, 273–285. https://doi.org/10.1016/j.isprsjprs.2026.04.010
This article is an independent editorial analysis of open-access peer-reviewed research. The PyTorch implementation is an educational adaptation designed to illustrate the paper’s concepts. For production deployments, the official 3DGS CUDA rasterizer (diff-gaussian-rasterization) is required for the tile-based rendering pipeline. The paper’s experiments used a single NVIDIA RTX 4090 GPU. Mesh extraction requires open3d. Supported by the National Natural Science Foundation of China (Grant No. 42571528) and the Taishan Scholars Programme (Grant No. 202408183).
