ANNOUNCEMENTS

Faithful Contouring: No SDF, No Watertight, 1024³ Voxelization

Faithful Contouring encodes meshes into sparse voxel tokens — no SDF, no watertighting, no Marching Cubes. 1024³ in ~1s on a single GPU. CVPR 2026 Oral.

Robin
Posted: June 1, 2026

TL;DR Faithful Contouring replaces the decades-old SDF + Marching Cubes pipeline with Faithful Contour Tokens — encoding triangle meshes directly into 18-dimensional sparse voxel tokens, no watertighting, no distance fields. Pure CUDA, 1024³ in ~1.4 s, 2048³ under 5 s on a single H100.

The past year has seen a surge of 3D generation work — methods like Trellis and Hunyuan3D topping benchmarks almost every month. But here's what's easy to overlook: while network architectures have evolved into DiT, Flow Matching, and Sparse Voxel Transformer, the underlying 3D representation still relies on 1987-era SDF + Marching Cubes. Faithful Contouring offers a fundamentally different approach to 3D voxelization.

Faithful Contouring (FC), accepted as a CVPR 2026 Oral, skips watertighting, skips SDF field construction, and skips Marching Cubes entirely. Instead, FC encodes triangle meshes directly into sparse voxel tokens — a new 3D representation that preserves topology, sharp edges, and internal structures. The entire mesh-to-voxel pipeline is implemented as pure CUDA kernels, completing 1024³ voxelization in roughly 1 second and 2048³ in under 5 seconds on a single NVIDIA H100.

Why Is SDF + Marching Cubes the Bottleneck in 3D Voxelization

For years, virtually every feedforward 3D reconstruction and generation method — DeepSDF, Occupancy Networks, and more recently Trellis, Hunyuan3D, Sparc3D, TripoSF — has followed the same pipeline:

Raw mesh → Watertight conversion → SDF / UDF → Marching Cubes → New mesh

This pipeline looks general-purpose, but every stage introduces non-trivial information loss:

  • Watertight preprocessing typically uses ε-ball dilation to seal holes. This rewrites topology and turns thin shells into thick shells.

  • Sign computation relies on global algorithms like flood-fill or winding numbers for inside/outside classification. These are unstable on non-manifold geometry, open surfaces, and internal cavities — and they resist GPU parallelism.

  • Iso-surface extraction via Marching Cubes smooths away sharp edges, destroys internal structure, and leaves behind staircase artifacts.

The more practical bottleneck is resolution: existing SDF-based methods nearly all cap out below 2048³ for 3D voxelization, precisely because global sign propagation and watertight preprocessing costs explode with resolution. Faithful Contouring was designed to bypass this entire pipeline.

Comparison with baselines

What Makes Faithful Contouring Stand Out

Most 3D voxelization methods follow a fixed path: mesh to distance field to iso-surface. Faithful Contouring asks a different question:

Can we skip the detour from mesh → distance field → iso-surface, and instead extract candidate anchor points directly inside each voxel, then reconnect them into a surface based on topological relationships?

This is exactly what Faithful Contouring does. It encodes triangle meshes directly into a set of sparse voxel tokens called Faithful Contour Tokens (FCT). The pipeline has three key properties:

  • Distance-field-free — no SDF computation, no sign determination needed
  • Rendering-free — no differentiable rendering supervision required
  • Fully local — each voxel interacts only with the few triangles that pass through it, making it naturally GPU-parallel

Faithful Contouring vs SDF + Marching Cubes

PropertySDF + Marching Cubes PipelineFaithful Contouring
Open surfaces / non-manifoldMust watertight first (loses topology)Natively supported
Internal cavitiesErased by flood-fillFully preserved
Sharp edges / cornersRounded by MCNaturally captured via QEF
Per-voxel representation1 SDF value18-dimensional token
Editing / assemblyDifficultDirect token-level operations

Faithful Contouring supports open surfaces, non-manifold geometry, and internal cavities natively — features that SDF-based voxelization methods lose during watertight preprocessing.

How Faithful Contouring Works: Encoder and Decoder Pipeline

The Faithful Contouring pipeline consists of an Encoder and a Decoder, both built entirely on classical geometry operators with no learned components.

Pipeline

Encoder: Mesh → FCT

For each voxel vv in the grid G\mathcal{G}, four steps are executed in sequence:

  • Active Voxel Detection (SAT) — Using the Separating Axis Theorem across 13 candidate axes, we test whether a triangle ff intersects a given voxel vv. Intersecting voxels are marked as active primal voxels.

  • Intersection Centroids — The Sutherland–Hodgman algorithm clips each triangle against the voxel's six faces to produce a convex polygon Qv,f=vfQ_{v,f} = v \cap f, then computes its centroid:

cv,f=13Ak=2m1Ak(q1+qk+qk+1)\mathbf{c}_{v,f} = \frac{1}{3A}\sum_{k=2}^{m-1} A_k (\mathbf{q}_1+\mathbf{q}_k+\mathbf{q}_{k+1})

By convexity, this centroid is guaranteed to lie inside the voxel. Each cv,f\mathbf{c}_{v,f}, paired with the originating triangle's normal nf\mathbf{n}_f, forms a local geometry sample.

  • Anchor Fitting (QEF) — This is the core step. All samples {(ci,ni)}\{(\mathbf{c}_i,\mathbf{n}_i)\} are fed into a Quadric Error Function minimization:
x=argminx  i(ni(xci))2+λxcˉ2\mathbf{x}^\ast = \arg\min_{\mathbf{x}}\;\sum_i (\mathbf{n}_i^\top(\mathbf{x}-\mathbf{c}_i))^2 + \lambda \|\mathbf{x}-\bar{\mathbf{c}}\|^2

The first term enforces tangent-plane consistency, driving the anchor toward the common intersection of all tangent planes — which is why sharp edges and corners are naturally captured. The second term provides centroid regularization to suppress drift under ill-conditioned inputs. The solution is given in closed form by a 3×3 normal equation:

(MM+λI)x=Md+λcˉ(M^\top M + \lambda I)\,\mathbf{x}^\ast = M^\top\mathbf{d} + \lambda\bar{\mathbf{c}}

Normals n\mathbf{n}^\ast are solved via Tikhonov regularization. The entire solve is fully independent per voxel — the fundamental reason FC scales to 2048³.

Low-poly preservation At lower voxel resolutions, the QEF-solved anchors still snap reliably to sharp features.

  • Semi-axis Intersections — Möller–Trumbore ray–triangle intersection is performed along six half-axis directions, producing a binary encoding {1,0,+1}6\{-1,0,+1\}^6 used during decoding to determine face orientation.

All information is packed per voxel into a single row:

FCT=[voxel index,  (x,n),  {md,(xd,nd)}d=18,  {oriente}]\mathrm{FCT} = \big[\,\text{voxel index},\;(\mathbf{x}^\ast, \mathbf{n}^\ast),\;\{\mathbf{m}_d, (\mathbf{x}_d, \mathbf{n}_d)\}_{d=1}^{8},\;\{\mathrm{orient}_e\}\,\big]

Each active voxel occupies 18 dimensions.

Decoder: FCT → Mesh

Decoding consists of two steps:

Global gather — Neighboring primal voxels sharing a dual voxel average their anchor points to produce a unified vertex set VV'.

Quad → Tri — The four dual anchors on each primal face form a quadrilateral. Orientation is determined by the semi-axis code, and the quad is split into two triangles along the diagonal that minimizes normal deviation.

The entire decode is also fully local — no global search involved.

Engineering: Sub-Second Voxelization With CUDA Parallelism

FC's algorithm design aligns tightly with GPU parallelism: each voxel depends only on the few triangles intersecting it, the QEF solve is a closed-form 3×3 normal equation, and the entire pipeline contains no global operators. This is why it scales to 2048³.

Specifically, FC avoids the following global processes: flood-fill connected-component propagation, winding-number O(NF)O(N \cdot F) full-mesh integration, ε-ball dilation for watertighting, and Marching Cubes post-processing. Each voxel's computation is fully independent and maps directly to a CUDA thread.

We implemented all core operators as pure CUDA kernels — SAT intersection, Sutherland–Hodgman clipping, closed-form QEF, and Möller–Trumbore half-axis intersection — eliminating Python-level loops. Benchmarks on a single NVIDIA H100:

ResolutionActive VoxelsEncodeDecodeTotal
128³71K0.27 s0.02 s0.29 s
256³287K0.45 s0.06 s0.51 s
512³1.1M0.52 s0.17 s0.70 s
1024³4.6M0.82 s0.61 s1.42 s
2048³18.4M2.16 s2.51 s4.68 s

Key numbers: 512³ end-to-end in 0.7 s, 1024³ in roughly 1.4 s, 2048³ under 5 s. Overall latency scales approximately linearly with active voxel count.

For comparison, traditional SDF reconstruction pipelines based on flood-fill or winding numbers typically require minutes to tens of minutes at 1024³, and cannot practically scale to 2048³ due to the memory and compute overhead of global sign propagation. FC not only runs directly at 2048³ but keeps end-to-end latency at the second scale.

Experimental Results

Representation Accuracy

On challenging subsets from ABO and Objaverse:

MethodResolutionHD ↓CD (G→P) ↓F-score (0.01) ↑
UDF1024High (double-layer artifacts)HighLow
Flood-fill SDF1024High (inflated surfaces)HighMedium
FlexiCubes1024MediumMediumMedium
FC10240.11 × 10⁻²0.01 × 10⁻⁴99.71
FC20480.11 × 10⁻²< 0.01 × 10⁻⁴99.99

FC is currently the only voxel representation that runs directly at 2048³, with distance errors stable at the 10⁻⁵ scale.

Fitting Comparison

VAE Reconstruction

To validate FCT as a deep learning representation, we built a dual-mode VAE using sparse 3D convolutions plus lightweight attention, supporting two input modes: (a) self-compression FCT → FCT, and (b) point cloud → FCT.

Compared against Trellis, SparseFlex, and Sparc3D on the Dora benchmark and Toys4k:

  • Chamfer Distance reduced by approximately 93%
  • F-score improved by approximately 35%

Notably, FC-VAE at 512³ resolution already surpasses SparseFlex / Sparc3D reconstructions at 1024³, demonstrating that a lossless representation can significantly reduce the learning burden on downstream networks.

VAE Comparison

Editing and Composition With Faithful Contour Tokens

FCT is a token-based sparse voxel representation, meaning all voxel-level operations transfer directly to tokens. FCT supports four categories of direct token-level operations:

Editing

  • Filtering — Ray-casting computes visibility; internal hidden voxels and their tokens are removed by threshold.
  • Texture — Additional channels are appended to the 18D token, binding texture attributes to anchors.
  • Manipulation — Rotation and nonlinear deformation act directly on anchors, followed by connectivity recomputation.
  • Partition & Assembly — Merging uses mean aggregation at anchor positions and max aggregation for orientation; splitting copies and separates token groups via geometric or semantic masks.

This makes FCT not just an input representation but a geometry token container that supports composition, editing, texturing, and style transfer — well aligned with the next phase of 3D generation: part-level, editable, and multimodal.

Open Source and Reproducibility

The code is fully open-sourced. Starting from v1.5, the codebase was refactored into pure Python + Atom3d, with CUDA kernels distributed as precompiled wheels — no local C++ toolchain required. We provide both a Pixi one-click environment and traditional pip installation, covering everything from research reproduction to production integration.

The repository includes:

  • Complete FCT codec (FCTEncoder / FCTDecoder), supporting arbitrary resolutions from 128³ to 2048³
  • BVH-accelerated mesh intersection backend Atom3d
  • Demo scripts and sample meshes (icosphere, pirateship, etc.) — ready to run and compare GLB outputs out of the box
  • FCT-VAE training code and diffusion model weights (coming soon)

Repository: github.com/Luo-Yihao/FaithC

Summary

What Faithful Contouring does can be summarized in one sentence:

It migrates the 3D representation layer from the distance-field paradigm to the contour-token paradigm.

Distance fields and Marching Cubes date back to the 1980s. Their 2D contemporaries have long been superseded by splines, SDF font rasterizers, and neural rendering across multiple iteration cycles. On the 3D side, deep coupling between the pipeline and established datasets and evaluation frameworks has long prevented systematic rethinking of the representation layer itself. FC offers a new foundation: closed-form solvable, GPU-friendly, and scalable to 2048³ — natively handling open surfaces and non-manifold geometry, fully preserving internal structures and sharp edges, all within 18 dimensions per voxel.

A noteworthy corroboration: shortly after this work was made public, Microsoft released TRELLIS.2 in January 2026, introducing O-Voxel — a field-free sparse voxel representation that likewise bypasses SDF / occupancy fields and encodes arbitrary topology (including non-manifold and open surfaces) directly via dual meshes. Two independent paths converging on similar conclusions within a narrow time window suggests that iso-surfaces are no longer the only option for 3D representation.

Limitations

The current version of FC has several clear limitations:

  • Severe self-intersection and tightly layered structures — When multiple thin surfaces interleave or press together at sub-voxel scale, samples within a single voxel come from different sheets. The QEF-solved anchor becomes ambiguous and exhibits local drift.
  • Underutilized VAE capacity — The current FCT-VAE adopts the sparse convolution + local attention backbone from Sparc3D / SparseFlex. Its modeling capacity for extremely fine branches, dense decorations, and other highly irregular structures still has room for improvement.
  • Sharpness degradation through decoding — FCT decoded through the VAE shows slight loss in sharpness and smoothness compared to direct fitting, inherent to latent compression. This also points to the need for finer alignment between token representations and latent dimensions.

What's Next: Vision 2 Preview

In the upcoming Faithful Contouring 2.0, we extend FCT to a multi-anchor form — a single voxel is no longer limited to one anchor but can host multiple anchors to represent complex intersecting structures. This will systematically improve FC's performance on self-intersections, tightly layered geometries, and nested thin shells, while providing downstream networks with finer-grained local geometric information. Stay tuned.

Citation

bibtex
@inproceedings{luo2026faithfulcontouring,
title     = {Faithful Contouring: Near-Lossless 3D Voxel Representation Free from Iso-surface},
author    = {Luo, Yihao and He, Xianglong and Pan, Chuanyu and Chen, Yiwen and Wu, Jiaqi
and Li, Yangguang and Ouyang, Wanli and Hu, Yuanming and Yang, Guang and Yap, ChoonHwai},
booktitle = {CVPR},
year      = {2026}
}

Enough with SDF + Marching Cubes? Time to bring geometry back — faithfully.

CVPR 2026 Oral · arXiv 2511.04029 · GitHub: Luo-Yihao/FaithC

Yihao Luo, Imperial College London — y.luo23@imperial.ac.uk

Was this post useful?

3D, On Command

Contact Sales