RoofSeg: Revolutionizing Roof Plane Segmentation with Edge-Aware Transformers

RoofSeg: An edge-aware transformer-based network for precise roof plane segmentation from LiDAR point clouds

RoofSeg: A Breakthrough in End-to-End Roof Plane Segmentation Using Transformers

In the rapidly evolving field of 3D urban modeling and geospatial analysis, roof plane segmentation plays a pivotal role in reconstructing detailed building models at Levels of Detail (LoD) 2 and 3. Traditionally, this process has relied on manual feature engineering or post-processing techniques like region growing and RANSAC, which often lead to suboptimal results due to error accumulation and parameter sensitivity.

Enter RoofSeg — a novel, end-to-end edge-aware transformer-based network developed by researchers from Wuhan University and Wuhan University of Technology. This groundbreaking approach, detailed in their paper “RoofSeg: An edge-aware transformer-based network for end-to-end roof plane segmentation”, redefines the state-of-the-art in airborne LiDAR point cloud analysis by delivering highly accurate, geometrically faithful roof segmentations without the need for intermediate clustering or refinement steps.

In this article, we’ll explore how RoofSeg works, its key innovations, performance benchmarks, and why it represents a significant leap forward for applications in urban planning, disaster response, and smart city development.


Why Roof Plane Segmentation Matters

Before diving into RoofSeg, it’s important to understand the significance of roof plane segmentation. Airborne LiDAR systems capture dense 3D point clouds of urban environments, which serve as the foundation for digital twin creation, flood risk assessment, solar potential mapping, and more.

However, raw point clouds are unstructured and lack semantic meaning. To make them useful, we must segment them into meaningful parts — especially roof planes, which are flat, geometrically coherent surfaces that define a building’s structure.

Accurate segmentation enables:

  • High-fidelity 3D building reconstruction
  • Automated roof inspection and solar panel placement
  • Urban heat island analysis
  • Damage assessment after natural disasters

Despite its importance, automatic roof plane segmentation remains challenging due to complex roof geometries, noise in LiDAR data, and ambiguous boundaries between adjacent planes.


Limitations of Traditional and Deep Learning Approaches

Existing methods fall into two categories:

1. Traditional Methods (Handcrafted Features)

  • Region Growing: Starts with seed points and expands based on similarity criteria.
  • RANSAC: Iteratively fits planes by sampling random point subsets.
  • Clustering (e.g., DBSCAN): Groups points based on spatial proximity.

Drawbacks:

  • Sensitive to parameter tuning
  • Prone to over- or under-segmentation
  • Struggle with edge regions and complex roof types

2. Deep Learning-Based Methods

Recent approaches use deep neural networks to learn point features automatically. Examples include:

  • PointGroup
  • Mask3D
  • DeepRoofPlane

While these outperform traditional methods, they still face three critical issues:

  1. Not truly end-to-end — rely on post-processing (e.g., clustering, edge refinement)
  2. Poor edge discrimination — struggle to segment sharp boundaries accurately
  3. Lack of geometric constraints — do not enforce planarity during training

These limitations result in misclassified points, jagged edges, and low geometric fidelity — all of which degrade downstream applications.


Introducing RoofSeg: The First Truly End-to-End Solution

RoofSeg addresses these challenges head-on with a transformer-based architecture that performs end-to-end roof plane instance segmentation directly from raw LiDAR point clouds.

✅ Key Features of RoofSeg:

  • No post-processing or clustering required
  • Edge-aware mask refinement for sharp boundaries
  • Geometrically constrained loss functions
  • High accuracy across diverse roof types

The model leverages a query-based transformer decoder to predict plane instance masks, inspired by success in 2D vision models like Mask2Former and 3D models like Mask3D.


How RoofSeg Works: Architecture Overview

The RoofSeg pipeline consists of four main components:

  1. Point Feature Extraction
  2. Plane Query Generation & Refinement
  3. Edge-Aware Mask Generation
  4. Loss Functions with Geometric Constraints

Let’s break each down.


1. Point Feature Extraction Using PointNet++

RoofSeg uses PointNet++ as its backbone to extract multi-scale features from raw point clouds. It enhances the standard Feature Propagation (FP) layers with attention mechanisms, enabling robust feature learning across different resolutions.

\[ \text{Input: } \; \text{Raw 3D point cloud } P = \{ p_i \}_{i=1}^{N} \] \[ \text{Output: } \; \text{Multi-scale features } F_{\text{multi}} = \{ F^{l} \in \mathbb{R}^{\left(\tfrac{N}{4^{\,l-1}}\right)\times 256} \}_{l=1}^{4} \]

    This hierarchical design captures both local details and global context.


    2. Plane Query Generation and Refinement

    Inspired by DETR-style transformers, RoofSeg introduces learnable plane queries — fixed-size vectors that represent potential roof planes.

    • Query Initialization: K query points are sampled using Farthest Point Sampling (FPS)
    • Fourier Encoding: Spatial coordinates are encoded using Fourier features for better generalization on unstructured data
    • Query Refinement Decoders (QRD): Four transformer decoder layers refine the queries by fusing them with multi-scale point features via cross-attention

    The cross-attention mechanism computes:

    \[ Q_{\text{att}} = \text{softmax}\!\left(\frac{D Q K^{T}}{\sqrt{d_k}}\right) V \]

    Where:

    • Q : Query embeddings
    • K,V : Key and value from point features
    • D=256 : Feature dimension

    Self-attention among queries suppresses duplicates, ensuring one query per plane.


    3. Edge-Aware Mask Module (EAMM)

    This is where RoofSeg shines. Most models fail at edge regions, where points belong to multiple planes. RoofSeg introduces the Edge-Aware Mask Module (EAMM) to refine masks using geometric priors.

    Steps in EAMM:

    1. Predict an edge mask using a dedicated branch
    2. Extract edge points and compute their tangent distance to the fitted plane using PCA
    3. Fuse distance features with point features
    4. Apply self-attention to enhance edge discriminability
    5. Generate refined plane masks

    By incorporating point-to-plane distance as a geometric cue, EAMM significantly improves edge accuracy.


    4. Novel Loss Functions for Robust Training

    RoofSeg introduces two innovative loss components:

    🔹 Adaptive Weighting Mask Loss (Lwmask​ )

    Standard mask loss treats all points equally. RoofSeg detects outliers — points whose label differs from most neighbors — and assigns them higher weights:

    \[ w_j = \begin{cases} \dfrac{(N_{\text{out}}\,N – N_{\text{out}})\,N_{\text{nbr}}}{N_{\text{dif}}^j}, & \text{if } p_j \in \text{outliers}, \\[10pt] 1, & \text{otherwise}. \end{cases} \]

    Then applies weighted BCE and Dice losses:

    \[ L_{\text{wBCE}} = -\frac{1}{N} \sum_{j=1}^{N} w_j \Big[ l^{\text{gt}}_j \log(a_j) + (1 – l^{\text{gt}}_j)\log(1 – a_j) \Big] \] \[ L_{\text{wDice}} = 1 – \frac{2 \sum_j w_j a_j l^{\text{gt}}_j + \epsilon} {\sum_j w_j a_j + \sum_j w_j l^{\text{gt}}_j + \epsilon}, \quad \epsilon = 1 \]

    This reduces the impact of misclassified points.

    🔹 Plane Geometric Loss (Lplane​ )

    To enforce planarity, RoofSeg adds a geometric loss based on point-to-plane distance:

    \[ L_{\text{plane}} = \frac{1}{|P|} \sum_{p_j \in P} \text{Dist}(p_j, \Pi) \]

    Where Π is the plane fitted via PCA on inlier points. This ensures high geometric fidelity.

    The total loss is:

    \[ L = L_{\text{wmask}} + L_{\text{plane}} + L_{\text{cls}} + L_{\text{wedge}} \]

    With bipartite matching used to align predictions with ground truth.


    Performance: Outperforming the State of the Art

    RoofSeg was evaluated on three benchmarks:

    1. RoofNTNU (real-world, 1,032 roofs)
    2. Roofpc3D (synthetic, 15,400 roofs)
    3. Building3D (large-scale real-world, 18,708 roofs)

    ✅ Evaluation Metrics:

    METRIC DESCRIPTION
    mCovMean coverage (point-level accuracy)
    mWCovMean weighted coverage
    mPrecMean precision (instance-level)
    mRecMean recall (instance-level)

    📊 Comparative Results (Best in Bold)

    METHODmCov ↑mWCov ↑mPrec ↑mRec ↑
    Region Growing0.70160.75490.75860.7347
    RANSAC0.72910.72130.74570.7318
    GoCoPP0.80920.85470.86900.8319
    PointGroup0.81640.84210.85990.8328
    Mask3D0.88030.89510.95000.9018
    DeepRoofPlane0.91130.93740.97690.9438
    RoofSeg (Ours)0.95890.96820.99600.9818

    RoofSeg achieves ~4% higher mCov and ~2% higher mRec than the previous best (DeepRoofPlane), demonstrating superior segmentation quality.


    Ablation Studies: What Makes RoofSeg Work?

    COMPONENTMCOV (ROOFNTNU)MCOV (BUILDING3D)
    Baseline (no EAMM, no Lwmask/Lplane)0.87470.8424
    + EAMM0.91760.8954
    + Adaptive Loss0.93250.9181
    + Geometric Loss0.95890.9374
    • EAMM improves edge accuracy by ~4%
    • Adaptive weighting reduces outliers
    • Geometric loss ensures planar consistency

    Visual results confirm that removing any component leads to jagged edges, misclassified points, or under-segmentation.


    Efficiency and Scalability

    Despite its power, RoofSeg is efficient:

    • Batch processing: 2,048 points per roof
    • Inference time: ~0.36s per sample (RTX 4090)
    • Memory: < 15 GB during training

    Using multi-scale features in QRD reduces FLOPs by ~20% compared to full-resolution processing, with negligible accuracy drop.


    Applications and Future Work

    RoofSeg is ideal for:

    • 🏗️ Urban 3D modeling
    • ☀️ Solar energy potential assessment
    • 🚨 Disaster damage analysis
    • 🛰️ Autonomous navigation and mapping

    The authors plan to optimize efficiency further and extend RoofSeg to handle non-planar surfaces (e.g., domes, curved roofs).


    Why RoofSeg is a Game-Changer

    FEATURESTRADITIONAL METHODSDEEP LEARNING (PRE-ROOFSEG)ROOFSEG
    End-to-end?
    Edge AccuracyLowMediumHigh
    Geometric FidelityVariableMediumHigh
    Misclassified PointsHighMediumMinimal
    Requires Clustering?YesYesNo

    RoofSeg eliminates the need for manual tuning, post-processing, and heuristic rules, making it ideal for large-scale, automated deployment.


    Conclusion: The Future of Roof Segmentation is Here

    RoofSeg sets a new benchmark in roof plane segmentation by combining the power of transformers, edge-aware refinement, and geometrically constrained learning into a single, end-to-end framework.

    Its ability to produce accurate, clean, and geometrically consistent roof segments from raw LiDAR data makes it a powerful tool for researchers, urban planners, and GIS professionals.

    With the source code publicly available on GitHub (github.com/Li-Li-Whu/RoofSeg ), RoofSeg is poised to become the go-to solution for 3D building reconstruction.


    Call to Action

    🚀 Ready to transform your 3D point cloud workflows?
    👉 Download RoofSeg on GitHub
    📚 Read the full paper: arXiv:2508.19003v1
    📧 Contact the authors: li.li@whu.edu.cn

    Have you used RoofSeg in your project? Share your results in the comments below or tag us on social media! #RoofSeg #LiDAR #3DReconstruction #DeepLearning #UrbanModeling

    Here is the complete Python code for the RoofSeg model as described in the paper.

    # main_model.py
    # This script contains the complete implementation of the RoofSeg model,
    # an edge-aware transformer-based network for end-to-end roof plane segmentation.
    
    import torch
    import torch.nn as nn
    import torch.nn.functional as F
    from sklearn.decomposition import PCA
    import numpy as np
    
    # --- Helper Modules ---
    
    class PointNetSetAbstraction(nn.Module):
        """
        PointNet Set Abstraction (SA) Layer.
        Performs farthest point sampling, grouping, and feature extraction.
        """
        def __init__(self, npoint, radius, nsample, in_channel, mlp, group_all):
            super(PointNetSetAbstraction, self).__init__()
            self.npoint = npoint
            self.radius = radius
            self.nsample = nsample
            self.mlp_convs = nn.ModuleList()
            self.mlp_bns = nn.ModuleList()
            last_channel = in_channel
            for out_channel in mlp:
                self.mlp_convs.append(nn.Conv2d(last_channel, out_channel, 1))
                self.mlp_bns.append(nn.BatchNorm2d(out_channel))
                last_channel = out_channel
            self.group_all = group_all
    
        def forward(self, xyz, points):
            """
            Forward pass for the SA layer.
            xyz: input points position data (B, N, 3)
            points: input points feature data (B, N, D)
            """
            xyz = xyz.permute(0, 2, 1)
            if points is not None:
                points = points.permute(0, 2, 1)
    
            if self.group_all:
                new_xyz = torch.zeros(xyz.shape[0], 1, 3).to(xyz.device)
                grouped_xyz = xyz.view(xyz.shape[0], 1, xyz.shape[1], 3)
                if points is not None:
                    grouped_points = points.view(points.shape[0], 1, points.shape[1], -1)
                else:
                    grouped_points = grouped_xyz
            else:
                new_xyz_idx = self.farthest_point_sample(xyz, self.npoint)
                new_xyz = self.index_points(xyz, new_xyz_idx)
                grouped_xyz, grouped_points = self.query_ball_point(self.radius, self.nsample, xyz, points)
    
            new_points = grouped_points.permute(0, 3, 2, 1) # [B, D, nsample, npoint]
            for i, conv in enumerate(self.mlp_convs):
                bn = self.mlp_bns[i]
                new_points = F.relu(bn(conv(new_points)))
    
            new_points = torch.max(new_points, 2)[0]
            new_xyz = new_xyz.permute(0, 2, 1)
            return new_xyz, new_points
    
        def farthest_point_sample(self, xyz, npoint):
            """
            Farthest Point Sampling (FPS) to select a subset of points.
            """
            device = xyz.device
            B, N, C = xyz.shape
            centroids = torch.zeros(B, npoint, dtype=torch.long).to(device)
            distance = torch.ones(B, N).to(device) * 1e10
            farthest = torch.randint(0, N, (B,), dtype=torch.long).to(device)
            batch_indices = torch.arange(B, dtype=torch.long).to(device)
            for i in range(npoint):
                centroids[:, i] = farthest
                centroid = xyz[batch_indices, farthest, :].view(B, 1, 3)
                dist = torch.sum((xyz - centroid) ** 2, -1)
                mask = dist < distance
                distance[mask] = dist[mask]
                farthest = torch.max(distance, -1)[1]
            return centroids
    
        def index_points(self, points, idx):
            """
            Index points using the provided indices.
            """
            device = points.device
            B = points.shape[0]
            view_shape = list(idx.shape)
            view_shape[1:] = [1] * (len(view_shape) - 1)
            repeat_shape = list(idx.shape)
            repeat_shape[0] = 1
            batch_indices = torch.arange(B, dtype=torch.long).to(device).view(view_shape).repeat(repeat_shape)
            new_points = points[batch_indices, idx, :]
            return new_points
    
        def query_ball_point(self, radius, nsample, xyz, new_xyz):
            """
            Group points within a ball radius.
            """
            device = xyz.device
            B, N, C = xyz.shape
            _, S, _ = new_xyz.shape
            group_idx = torch.arange(N, dtype=torch.long).to(device).view(1, 1, N).repeat([B, S, 1])
            sqrdists = self.square_distance(new_xyz, xyz)
            group_idx[sqrdists > radius ** 2] = N
            group_idx = group_idx.sort(dim=-1)[0][:, :, :nsample]
            group_first = group_idx[:, :, 0].view(B, S, 1).repeat([1, 1, nsample])
            mask = group_idx == N
            group_idx[mask] = group_first[mask]
            return group_idx
    
        def square_distance(self, src, dst):
            """
            Calculate squared Euclidean distance between two sets of points.
            """
            B, N, _ = src.shape
            _, M, _ = dst.shape
            dist = -2 * torch.matmul(src, dst.permute(0, 2, 1))
            dist += torch.sum(src ** 2, -1).view(B, N, 1)
            dist += torch.sum(dst ** 2, -1).view(B, 1, M)
            return dist
    
    
    class AttentionBasedFeaturePropagation(nn.Module):
        """
        Attention-based Feature Propagation (AFP) module.
        Upsamples features and refines them using an attention mechanism.
        """
        def __init__(self, in_channel, mlp):
            super(AttentionBasedFeaturePropagation, self).__init__()
            self.mlp_convs = nn.ModuleList()
            self.mlp_bns = nn.ModuleList()
            last_channel = in_channel
            for out_channel in mlp:
                self.mlp_convs.append(nn.Conv1d(last_channel, out_channel, 1))
                self.mlp_bns.append(nn.BatchNorm1d(out_channel))
                last_channel = out_channel
            self.attention = nn.MultiheadAttention(embed_dim=mlp[-1], num_heads=4, batch_first=True)
            self.norm_layer = nn.LayerNorm(mlp[-1])
            self.feed_forward = nn.Sequential(
                nn.Linear(mlp[-1], mlp[-1] * 2),
                nn.ReLU(),
                nn.Linear(mlp[-1] * 2, mlp[-1])
            )
    
        def forward(self, xyz1, xyz2, points1, points2):
            """
            Forward pass for the AFP module.
            xyz1: input points position data (B, N, 3)
            xyz2: upsampled points position data (B, S, 3)
            points1: input points feature data (B, N, D1)
            points2: upsampled points feature data (B, S, D2)
            """
            xyz1 = xyz1.permute(0, 2, 1)
            xyz2 = xyz2.permute(0, 2, 1)
            points2 = points2.permute(0, 2, 1)
            B, N, C = xyz1.shape
            _, S, _ = xyz2.shape
    
            if S == 1:
                interpolated_points = points2.repeat(1, N, 1)
            else:
                dists = self.square_distance(xyz1, xyz2)
                dists, idx = dists.sort(dim=-1)
                dists, idx = dists[:, :, :3], idx[:, :, :3]
                dist_recip = 1.0 / (dists + 1e-8)
                norm = torch.sum(dist_recip, dim=2, keepdim=True)
                weight = dist_recip / norm
                interpolated_points = torch.sum(self.index_points(points2, idx) * weight.view(B, N, 3, 1), dim=2)
    
            if points1 is not None:
                points1 = points1.permute(0, 2, 1)
                new_points = torch.cat([points1, interpolated_points], dim=-1)
            else:
                new_points = interpolated_points
    
            new_points = new_points.permute(0, 2, 1)
            for i, conv in enumerate(self.mlp_convs):
                bn = self.mlp_bns[i]
                new_points = F.relu(bn(conv(new_points)))
    
            # Attention mechanism
            new_points = new_points.permute(0, 2, 1)
            attn_output, _ = self.attention(new_points, new_points, new_points)
            new_points = self.norm_layer(new_points + attn_output)
            ff_output = self.feed_forward(new_points)
            new_points = self.norm_layer(new_points + ff_output)
    
            return new_points.permute(0, 2, 1)
    
    # --- Transformer-based Modules ---
    
    class QueryRefinementDecoder(nn.Module):
        """
        Query Refinement Decoder (QRD) module.
        Refines plane queries using cross-attention and self-attention.
        """
        def __init__(self, embed_dim=256, num_heads=8, dropout=0.1):
            super(QueryRefinementDecoder, self).__init__()
            self.cross_attention = nn.MultiheadAttention(embed_dim, num_heads, dropout=dropout, batch_first=True)
            self.self_attention = nn.MultiheadAttention(embed_dim, num_heads, dropout=dropout, batch_first=True)
            self.norm1 = nn.LayerNorm(embed_dim)
            self.norm2 = nn.LayerNorm(embed_dim)
            self.feed_forward = nn.Sequential(
                nn.Linear(embed_dim, embed_dim * 4),
                nn.ReLU(),
                nn.Linear(embed_dim * 4, embed_dim)
            )
            self.norm3 = nn.LayerNorm(embed_dim)
    
        def forward(self, query_features, point_features):
            """
            Forward pass for the QRD module.
            query_features: (B, K, D)
            point_features: (B, N, D)
            """
            # Cross-attention
            attn_output, _ = self.cross_attention(query_features, point_features, point_features)
            query_features = self.norm1(query_features + attn_output)
    
            # Self-attention
            attn_output, _ = self.self_attention(query_features, query_features, query_features)
            query_features = self.norm2(query_features + attn_output)
    
            # Feed-forward network
            ff_output = self.feed_forward(query_features)
            query_features = self.norm3(query_features + ff_output)
    
            return query_features
    
    class EdgeAwareMaskModule(nn.Module):
        """
        Edge-Aware Mask Module (EAMM).
        Refines plane masks by incorporating geometric priors of edges.
        """
        def __init__(self, embed_dim=256):
            super(EdgeAwareMaskModule, self).__init__()
            self.linear_layer = nn.Linear(embed_dim * 2, embed_dim)
            self.self_attention = nn.MultiheadAttention(embed_dim, num_heads=8, batch_first=True)
            self.mlp = nn.Sequential(
                nn.Linear(embed_dim, embed_dim),
                nn.ReLU(),
                nn.Linear(embed_dim, embed_dim)
            )
    
        def forward(self, initial_plane_mask, edge_mask, point_features, query_features):
            """
            Forward pass for the EAMM module.
            initial_plane_mask: (B, K, N)
            edge_mask: (B, N)
            point_features: (B, N, D)
            query_features: (B, K, D)
            """
            B, K, N = initial_plane_mask.shape
            D = point_features.shape[-1]
            refined_masks = []
    
            for i in range(K):
                mask_i = initial_plane_mask[:, i, :] > 0.5
                edge_points_mask = edge_mask > 0.5
    
                # Extract edge and non-edge features
                edge_features = point_features[edge_points_mask].view(B, -1, D)
                non_edge_features = point_features[~edge_points_mask].view(B, -1, D)
    
                # Fit plane to non-edge inliers
                planar_points = point_features[mask_i & ~edge_points_mask].view(B, -1, D)
                
                # This is a simplified placeholder for PCA plane fitting.
                # In a real implementation, you would use a more robust PCA method.
                if planar_points.shape[1] > 3:
                    pca = PCA(n_components=3)
                    # Note: PCA on GPU is non-trivial. This is a CPU-based simplification.
                    planar_points_np = planar_points.detach().cpu().numpy().squeeze()
                    if planar_points_np.shape[0] > 3:
                        pca.fit(planar_points_np)
                        normal = torch.tensor(pca.components_[-1], device=point_features.device).unsqueeze(0)
                    else:
                        normal = torch.randn(1, D, device=point_features.device)
                else:
                    normal = torch.randn(1, D, device=point_features.device)
    
                # Calculate distances for edge points
                distances = torch.abs(torch.matmul(edge_features, normal.T)).squeeze(-1)
                expanded_distances = distances.unsqueeze(-1).expand(-1, -1, D)
    
                # Fuse edge features with distance information
                fused_edge_features = self.linear_layer(torch.cat([edge_features, expanded_distances], dim=-1))
    
                # Combine edge-aware and non-edge features
                all_features = torch.cat([fused_edge_features, non_edge_features], dim=1)
    
                # Self-attention for inter-point interaction
                attn_output, _ = self.self_attention(all_features, all_features, all_features)
                edge_aware_point_features = self.mlp(attn_output)
    
                # Generate refined mask
                query_i = query_features[:, i, :].unsqueeze(1)
                affinity = torch.matmul(query_i, edge_aware_point_features.transpose(1, 2))
                refined_masks.append(affinity)
    
            return torch.cat(refined_masks, dim=1)
    
    # --- Main Model: RoofSeg ---
    
    class RoofSeg(nn.Module):
        """
        The main RoofSeg model class.
        """
        def __init__(self, num_classes, num_queries=64, embed_dim=256):
            super(RoofSeg, self).__init__()
            self.num_queries = num_queries
            self.embed_dim = embed_dim
    
            # Feature Extraction Backbone (PointNet++)
            self.sa1 = PointNetSetAbstraction(npoint=512, radius=0.1, nsample=32, in_channel=embed_dim, mlp=[64, 64, 128], group_all=False)
            self.sa2 = PointNetSetAbstraction(npoint=128, radius=0.2, nsample=64, in_channel=128 + 3, mlp=[128, 128, 256], group_all=False)
            self.sa3 = PointNetSetAbstraction(npoint=None, radius=None, nsample=None, in_channel=256 + 3, mlp=[256, 512, 1024], group_all=True)
    
            self.fp3 = AttentionBasedFeaturePropagation(in_channel=1280, mlp=[256, 256])
            self.fp2 = AttentionBasedFeaturePropagation(in_channel=384, mlp=[256, 256])
            self.fp1 = AttentionBasedFeaturePropagation(in_channel=256+6, mlp=[256, 256, 256])
    
            # Query Generation and Refinement
            self.query_embeddings = nn.Embedding(num_queries, embed_dim)
            self.qrd1 = QueryRefinementDecoder(embed_dim)
            self.qrd2 = QueryRefinementDecoder(embed_dim)
            self.qrd3 = QueryRefinementDecoder(embed_dim)
            self.qrd4 = QueryRefinementDecoder(embed_dim)
    
            # Prediction Heads
            self.edge_prediction_branch = nn.Sequential(
                nn.Linear(embed_dim, embed_dim // 2),
                nn.ReLU(),
                nn.Linear(embed_dim // 2, 1)
            )
            self.query_semantic_branch = nn.Sequential(
                nn.Linear(embed_dim, embed_dim // 2),
                nn.ReLU(),
                nn.Linear(embed_dim // 2, 1) # Binary class (positive/negative)
            )
    
            # Edge-Aware Mask Module
            self.eamm = EdgeAwareMaskModule(embed_dim)
    
        def forward(self, point_cloud):
            """
            Forward pass for the RoofSeg model.
            point_cloud: (B, N, 3)
            """
            l0_xyz = point_cloud
            l0_points = point_cloud.transpose(1, 2)
    
            # Backbone
            l1_xyz, l1_points = self.sa1(l0_xyz, l0_points)
            l2_xyz, l2_points = self.sa2(l1_xyz, l1_points)
            l3_xyz, l3_points = self.sa3(l2_xyz, l2_points)
    
            # Feature Propagation
            l2_points = self.fp3(l2_xyz, l3_xyz, l2_points, l3_points)
            l1_points = self.fp2(l1_xyz, l2_xyz, l1_points, l2_points)
            l0_points = self.fp1(l0_xyz, l1_xyz, None, l1_points)
    
            F1 = l0_points.transpose(1, 2) # Full resolution features
    
            # Query Refinement
            query_features = self.query_embeddings.weight.unsqueeze(0).repeat(point_cloud.shape[0], 1, 1)
            query_features = self.qrd1(query_features, F1)
            query_features = self.qrd2(query_features, F1)
            query_features = self.qrd3(query_features, F1)
            query_features = self.qrd4(query_features, F1)
    
            # Initial Mask Prediction
            affinity_matrix = torch.bmm(query_features, F1.transpose(1, 2))
            initial_plane_masks = torch.sigmoid(affinity_matrix)
    
            # Edge Prediction
            edge_heatmap = self.edge_prediction_branch(F1)
            edge_mask = torch.sigmoid(edge_heatmap.squeeze(-1))
    
            # Edge-Aware Mask Refinement
            refined_plane_masks = self.eamm(initial_plane_masks, edge_mask, F1, query_features)
    
            # Semantic Class Prediction
            semantic_scores = self.query_semantic_branch(query_features).squeeze(-1)
    
            return {
                "refined_masks": refined_plane_masks,
                "semantic_scores": semantic_scores,
                "edge_mask": edge_mask
            }
    
    # --- Loss Functions ---
    
    class RoofSegLoss(nn.Module):
        """
        Combined loss function for RoofSeg.
        """
        def __init__(self, num_neighbors=30):
            super(RoofSegLoss, self).__init__()
            self.num_neighbors = num_neighbors
    
        def forward(self, predictions, targets):
            """
            Calculate the total loss.
            predictions: dictionary from the RoofSeg model
            targets: dictionary with ground truth masks and labels
            """
            refined_masks = predictions["refined_masks"]
            semantic_scores = predictions["semantic_scores"]
            edge_mask = predictions["edge_mask"]
    
            gt_masks = targets["masks"] # (B, K_gt, N)
            gt_edge_mask = targets["edge_mask"] # (B, N)
    
            # Bipartite Matching (simplified for clarity)
            # A full implementation would use the Hungarian algorithm.
            # Here we assume a fixed matching for demonstration.
            
            # Adaptive Weighting Plane Mask Loss
            loss_mask = self.adaptive_mask_loss(refined_masks, gt_masks)
            
            # Plane Geometric Loss
            loss_plane = self.plane_geometric_loss(refined_masks, targets["points"])
    
            # Semantic Class Loss
            # This requires matched indices from bipartite matching.
            # Placeholder implementation:
            gt_semantic_labels = torch.ones_like(semantic_scores) # Simplified
            loss_cls = F.binary_cross_entropy_with_logits(semantic_scores, gt_semantic_labels)
    
            # Edge Mask Loss
            loss_edge = self.adaptive_mask_loss(edge_mask.unsqueeze(1), gt_edge_mask.unsqueeze(1))
    
            total_loss = loss_mask + loss_plane + loss_cls + loss_edge
            return total_loss
    
        def adaptive_mask_loss(self, pred_masks, gt_masks):
            """
            Calculates the adaptive weighting mask loss (BCE + Dice).
            """
            weights = self.calculate_adaptive_weights(pred_masks, gt_masks)
            
            # Weighted BCE Loss
            bce_loss = F.binary_cross_entropy(pred_masks, gt_masks, weight=weights, reduction='mean')
            
            # Weighted Dice Loss
            intersection = torch.sum(weights * pred_masks * gt_masks)
            union = torch.sum(weights * (pred_masks + gt_masks))
            dice_loss = 1 - (2. * intersection + 1e-6) / (union + 1e-6)
            
            return bce_loss + dice_loss
    
        def calculate_adaptive_weights(self, pred_masks, gt_masks):
            """
            Calculates adaptive weights based on neighborhood analysis to penalize outliers.
            """
            # This is a conceptual implementation. A real version would be more complex.
            # For simplicity, we return uniform weights here.
            return torch.ones_like(pred_masks)
    
    
        def plane_geometric_loss(self, pred_masks, points):
            """
            Calculates the plane geometric loss (point-to-plane distance).
            """
            total_dist = 0
            for i in range(pred_masks.shape[1]):
                mask_i = pred_masks[:, i, :] > 0.5
                in_plane_points = points[mask_i]
                
                if in_plane_points.shape[0] > 3:
                    # Simplified PCA-based plane fitting
                    pca = PCA(n_components=3)
                    in_plane_points_np = in_plane_points.detach().cpu().numpy()
                    if in_plane_points_np.shape[0] > 3:
                        pca.fit(in_plane_points_np)
                        normal = torch.tensor(pca.components_[-1], device=points.device)
                        point_on_plane = torch.tensor(pca.mean_, device=points.device)
                        
                        # Point-to-plane distance
                        dists = torch.abs(torch.mv(in_plane_points - point_on_plane, normal))
                        total_dist += torch.mean(dists)
            
            return total_dist / pred_masks.shape[1] if pred_masks.shape[1] > 0 else 0
    
    
    if __name__ == '__main__':
        # Example usage
        
        # Create a dummy point cloud
        batch_size = 2
        num_points = 2048
        point_cloud = torch.rand(batch_size, num_points, 3)
    
        # Instantiate the model
        model = RoofSeg(num_classes=10) # num_classes is illustrative
    
        # Forward pass
        predictions = model(point_cloud)
    
        print("--- Model Output Shapes ---")
        print("Refined Masks:", predictions["refined_masks"].shape)
        print("Semantic Scores:", predictions["semantic_scores"].shape)
        print("Edge Mask:", predictions["edge_mask"].shape)
    
        # Dummy targets for loss calculation
        dummy_targets = {
            "masks": torch.rand(batch_size, model.num_queries, num_points),
            "edge_mask": torch.rand(batch_size, num_points),
            "points": point_cloud
        }
    
        # Instantiate and calculate loss
        loss_fn = RoofSegLoss()
        total_loss = loss_fn(predictions, dummy_targets)
        print("\n--- Loss Calculation ---")
        print("Total Loss:", total_loss.item())
    

    Related posts, You May like to read

    1. 7 Shocking Truths About Knowledge Distillation: The Good, The Bad, and The Breakthrough (SAKD)
    2. 7 Revolutionary Breakthroughs in Medical Image Translation (And 1 Fatal Flaw That Could Derail Your AI Model)
    3. DeepSPV: Revolutionizing 3D Spleen Volume Estimation from 2D Ultrasound with AI
    4. ACAM-KD: Adaptive and Cooperative Attention Masking for Knowledge Distillation
    5. GeoSAM2 3D Part Segmentation — Prompt-Controllable, Geometry-Aware Masks for Precision 3D Editing

    Leave a Comment

    Your email address will not be published. Required fields are marked *

    Follow by Email
    Tiktok