TL;DR Faithful Contouring replaces the decades-old SDF + Marching Cubes pipeline with Faithful Contour Tokens — encoding triangle meshes directly into 18-dimensional sparse voxel tokens, no watertighting, no distance fields. Pure CUDA, 1024³ in ~1.4 s, 2048³ under 5 s on a single H100.
The past year has seen a surge of 3D generation work — methods like Trellis and Hunyuan3D topping benchmarks almost every month. But here's what's easy to overlook: while network architectures have evolved into DiT, Flow Matching, and Sparse Voxel Transformer, the underlying 3D representation still relies on 1987-era SDF + Marching Cubes. Faithful Contouring offers a fundamentally different approach to 3D voxelization.
Faithful Contouring (FC), accepted as a CVPR 2026 Oral, skips watertighting, skips SDF field construction, and skips Marching Cubes entirely. Instead, FC encodes triangle meshes directly into sparse voxel tokens — a new 3D representation that preserves topology, sharp edges, and internal structures. The entire mesh-to-voxel pipeline is implemented as pure CUDA kernels, completing 1024³ voxelization in roughly 1 second and 2048³ in under 5 seconds on a single NVIDIA H100.
Why Is SDF + Marching Cubes the Bottleneck in 3D Voxelization
For years, virtually every feedforward 3D reconstruction and generation method — DeepSDF, Occupancy Networks, and more recently Trellis, Hunyuan3D, Sparc3D, TripoSF — has followed the same pipeline:
Raw mesh → Watertight conversion → SDF / UDF → Marching Cubes → New mesh
This pipeline looks general-purpose, but every stage introduces non-trivial information loss:
-
Watertight preprocessing typically uses ε-ball dilation to seal holes. This rewrites topology and turns thin shells into thick shells.
-
Sign computation relies on global algorithms like flood-fill or winding numbers for inside/outside classification. These are unstable on non-manifold geometry, open surfaces, and internal cavities — and they resist GPU parallelism.
-
Iso-surface extraction via Marching Cubes smooths away sharp edges, destroys internal structure, and leaves behind staircase artifacts.
The more practical bottleneck is resolution: existing SDF-based methods nearly all cap out below 2048³ for 3D voxelization, precisely because global sign propagation and watertight preprocessing costs explode with resolution. Faithful Contouring was designed to bypass this entire pipeline.
![]()
What Makes Faithful Contouring Stand Out
Most 3D voxelization methods follow a fixed path: mesh to distance field to iso-surface. Faithful Contouring asks a different question:
Can we skip the detour from mesh → distance field → iso-surface, and instead extract candidate anchor points directly inside each voxel, then reconnect them into a surface based on topological relationships?
This is exactly what Faithful Contouring does. It encodes triangle meshes directly into a set of sparse voxel tokens called Faithful Contour Tokens (FCT). The pipeline has three key properties:
- Distance-field-free — no SDF computation, no sign determination needed
- Rendering-free — no differentiable rendering supervision required
- Fully local — each voxel interacts only with the few triangles that pass through it, making it naturally GPU-parallel
Faithful Contouring vs SDF + Marching Cubes
| Property | SDF + Marching Cubes Pipeline | Faithful Contouring |
|---|---|---|
| Open surfaces / non-manifold | Must watertight first (loses topology) | Natively supported |
| Internal cavities | Erased by flood-fill | Fully preserved |
| Sharp edges / corners | Rounded by MC | Naturally captured via QEF |
| Per-voxel representation | 1 SDF value | 18-dimensional token |
| Editing / assembly | Difficult | Direct token-level operations |
Faithful Contouring supports open surfaces, non-manifold geometry, and internal cavities natively — features that SDF-based voxelization methods lose during watertight preprocessing.
How Faithful Contouring Works: Encoder and Decoder Pipeline
The Faithful Contouring pipeline consists of an Encoder and a Decoder, both built entirely on classical geometry operators with no learned components.
![]()
Encoder: Mesh → FCT
For each voxel in the grid , four steps are executed in sequence:
-
Active Voxel Detection (SAT) — Using the Separating Axis Theorem across 13 candidate axes, we test whether a triangle intersects a given voxel . Intersecting voxels are marked as active primal voxels.
-
Intersection Centroids — The Sutherland–Hodgman algorithm clips each triangle against the voxel's six faces to produce a convex polygon , then computes its centroid:
By convexity, this centroid is guaranteed to lie inside the voxel. Each , paired with the originating triangle's normal , forms a local geometry sample.
- Anchor Fitting (QEF) — This is the core step. All samples are fed into a Quadric Error Function minimization:
The first term enforces tangent-plane consistency, driving the anchor toward the common intersection of all tangent planes — which is why sharp edges and corners are naturally captured. The second term provides centroid regularization to suppress drift under ill-conditioned inputs. The solution is given in closed form by a 3×3 normal equation:
Normals are solved via Tikhonov regularization. The entire solve is fully independent per voxel — the fundamental reason FC scales to 2048³.
At lower voxel resolutions, the QEF-solved anchors still snap reliably to sharp features.
- Semi-axis Intersections — Möller–Trumbore ray–triangle intersection is performed along six half-axis directions, producing a binary encoding used during decoding to determine face orientation.
All information is packed per voxel into a single row:
Each active voxel occupies 18 dimensions.
Decoder: FCT → Mesh
Decoding consists of two steps:
Global gather — Neighboring primal voxels sharing a dual voxel average their anchor points to produce a unified vertex set .
Quad → Tri — The four dual anchors on each primal face form a quadrilateral. Orientation is determined by the semi-axis code, and the quad is split into two triangles along the diagonal that minimizes normal deviation.
The entire decode is also fully local — no global search involved.
Engineering: Sub-Second Voxelization With CUDA Parallelism
FC's algorithm design aligns tightly with GPU parallelism: each voxel depends only on the few triangles intersecting it, the QEF solve is a closed-form 3×3 normal equation, and the entire pipeline contains no global operators. This is why it scales to 2048³.
Specifically, FC avoids the following global processes: flood-fill connected-component propagation, winding-number full-mesh integration, ε-ball dilation for watertighting, and Marching Cubes post-processing. Each voxel's computation is fully independent and maps directly to a CUDA thread.
We implemented all core operators as pure CUDA kernels — SAT intersection, Sutherland–Hodgman clipping, closed-form QEF, and Möller–Trumbore half-axis intersection — eliminating Python-level loops. Benchmarks on a single NVIDIA H100:
| Resolution | Active Voxels | Encode | Decode | Total |
|---|---|---|---|---|
| 128³ | 71K | 0.27 s | 0.02 s | 0.29 s |
| 256³ | 287K | 0.45 s | 0.06 s | 0.51 s |
| 512³ | 1.1M | 0.52 s | 0.17 s | 0.70 s |
| 1024³ | 4.6M | 0.82 s | 0.61 s | 1.42 s |
| 2048³ | 18.4M | 2.16 s | 2.51 s | 4.68 s |
Key numbers: 512³ end-to-end in 0.7 s, 1024³ in roughly 1.4 s, 2048³ under 5 s. Overall latency scales approximately linearly with active voxel count.
For comparison, traditional SDF reconstruction pipelines based on flood-fill or winding numbers typically require minutes to tens of minutes at 1024³, and cannot practically scale to 2048³ due to the memory and compute overhead of global sign propagation. FC not only runs directly at 2048³ but keeps end-to-end latency at the second scale.
Experimental Results
Representation Accuracy
On challenging subsets from ABO and Objaverse:
| Method | Resolution | HD ↓ | CD (G→P) ↓ | F-score (0.01) ↑ |
|---|---|---|---|---|
| UDF | 1024 | High (double-layer artifacts) | High | Low |
| Flood-fill SDF | 1024 | High (inflated surfaces) | High | Medium |
| FlexiCubes | 1024 | Medium | Medium | Medium |
| FC | 1024 | 0.11 × 10⁻² | 0.01 × 10⁻⁴ | 99.71 |
| FC | 2048 | 0.11 × 10⁻² | < 0.01 × 10⁻⁴ | 99.99 |
FC is currently the only voxel representation that runs directly at 2048³, with distance errors stable at the 10⁻⁵ scale.
![]()
VAE Reconstruction
To validate FCT as a deep learning representation, we built a dual-mode VAE using sparse 3D convolutions plus lightweight attention, supporting two input modes: (a) self-compression FCT → FCT, and (b) point cloud → FCT.
Compared against Trellis, SparseFlex, and Sparc3D on the Dora benchmark and Toys4k:
- Chamfer Distance reduced by approximately 93%
- F-score improved by approximately 35%
Notably, FC-VAE at 512³ resolution already surpasses SparseFlex / Sparc3D reconstructions at 1024³, demonstrating that a lossless representation can significantly reduce the learning burden on downstream networks.
![]()
Editing and Composition With Faithful Contour Tokens
FCT is a token-based sparse voxel representation, meaning all voxel-level operations transfer directly to tokens. FCT supports four categories of direct token-level operations:
![]()
- Filtering — Ray-casting computes visibility; internal hidden voxels and their tokens are removed by threshold.
- Texture — Additional channels are appended to the 18D token, binding texture attributes to anchors.
- Manipulation — Rotation and nonlinear deformation act directly on anchors, followed by connectivity recomputation.
- Partition & Assembly — Merging uses mean aggregation at anchor positions and max aggregation for orientation; splitting copies and separates token groups via geometric or semantic masks.
This makes FCT not just an input representation but a geometry token container that supports composition, editing, texturing, and style transfer — well aligned with the next phase of 3D generation: part-level, editable, and multimodal.
Open Source and Reproducibility
The code is fully open-sourced. Starting from v1.5, the codebase was refactored into pure Python + Atom3d, with CUDA kernels distributed as precompiled wheels — no local C++ toolchain required. We provide both a Pixi one-click environment and traditional pip installation, covering everything from research reproduction to production integration.
The repository includes:
- Complete FCT codec (FCTEncoder / FCTDecoder), supporting arbitrary resolutions from 128³ to 2048³
- BVH-accelerated mesh intersection backend Atom3d
- Demo scripts and sample meshes (icosphere, pirateship, etc.) — ready to run and compare GLB outputs out of the box
- FCT-VAE training code and diffusion model weights (coming soon)
Repository: github.com/Luo-Yihao/FaithC
Summary
What Faithful Contouring does can be summarized in one sentence:
It migrates the 3D representation layer from the distance-field paradigm to the contour-token paradigm.
Distance fields and Marching Cubes date back to the 1980s. Their 2D contemporaries have long been superseded by splines, SDF font rasterizers, and neural rendering across multiple iteration cycles. On the 3D side, deep coupling between the pipeline and established datasets and evaluation frameworks has long prevented systematic rethinking of the representation layer itself. FC offers a new foundation: closed-form solvable, GPU-friendly, and scalable to 2048³ — natively handling open surfaces and non-manifold geometry, fully preserving internal structures and sharp edges, all within 18 dimensions per voxel.
A noteworthy corroboration: shortly after this work was made public, Microsoft released TRELLIS.2 in January 2026, introducing O-Voxel — a field-free sparse voxel representation that likewise bypasses SDF / occupancy fields and encodes arbitrary topology (including non-manifold and open surfaces) directly via dual meshes. Two independent paths converging on similar conclusions within a narrow time window suggests that iso-surfaces are no longer the only option for 3D representation.
Limitations
The current version of FC has several clear limitations:
- Severe self-intersection and tightly layered structures — When multiple thin surfaces interleave or press together at sub-voxel scale, samples within a single voxel come from different sheets. The QEF-solved anchor becomes ambiguous and exhibits local drift.
- Underutilized VAE capacity — The current FCT-VAE adopts the sparse convolution + local attention backbone from Sparc3D / SparseFlex. Its modeling capacity for extremely fine branches, dense decorations, and other highly irregular structures still has room for improvement.
- Sharpness degradation through decoding — FCT decoded through the VAE shows slight loss in sharpness and smoothness compared to direct fitting, inherent to latent compression. This also points to the need for finer alignment between token representations and latent dimensions.
What's Next: Vision 2 Preview
In the upcoming Faithful Contouring 2.0, we extend FCT to a multi-anchor form — a single voxel is no longer limited to one anchor but can host multiple anchors to represent complex intersecting structures. This will systematically improve FC's performance on self-intersections, tightly layered geometries, and nested thin shells, while providing downstream networks with finer-grained local geometric information. Stay tuned.
Citation
@inproceedings{luo2026faithfulcontouring,
title = {Faithful Contouring: Near-Lossless 3D Voxel Representation Free from Iso-surface},
author = {Luo, Yihao and He, Xianglong and Pan, Chuanyu and Chen, Yiwen and Wu, Jiaqi
and Li, Yangguang and Ouyang, Wanli and Hu, Yuanming and Yang, Guang and Yap, ChoonHwai},
booktitle = {CVPR},
year = {2026}
}Enough with SDF + Marching Cubes? Time to bring geometry back — faithfully.
CVPR 2026 Oral · arXiv 2511.04029 · GitHub: Luo-Yihao/FaithC
Yihao Luo, Imperial College London — y.luo23@imperial.ac.uk


