AI & ML New Capability

This method non-rigidly aligns inconsistent video diffusion frames into globally-consistent 3D pointclouds to enable high-quality environment reconstruction.

arXiv · March 18, 2026 · 2603.16736

Lukas Höllein, Matthias Nießner

The Takeaway

It effectively turns state-of-the-art video generation models into consistent 3D world generators. By solving the inherent frame-to-frame inconsistencies of diffusion models, it allows the creation of explorable, sharp 3D environments from purely 2D generative outputs.

From the abstract

Video diffusion models generate high-quality and diverse worlds; however, individual frames often lack 3D consistency across the output sequence, which makes the reconstruction of 3D worlds difficult. To this end, we propose a new method that handles these inconsistencies by non-rigidly aligning the video frames into a globally-consistent coordinate frame that produces sharp and detailed pointcloud reconstructions. First, a geometric foundation model lifts each frame into a pixel-wise 3D pointcl