WorldMesh generates consistent, large-scale 3D worlds by populating a geometric mesh scaffold with image diffusion-derived content.
March 25, 2026
Original Paper
WorldMesh: Generating Navigable Multi-Room 3D Scenes via Mesh-Conditioned Image Diffusion
arXiv · 2603.22972
The Takeaway
Existing 3D generation often fails at scale due to geometric drift; this 'geometry-first' approach ensures multi-room consistency and photorealism. It enables the creation of navigable, environment-scale immersive worlds that were previously difficult to generate consistently from text.
From the abstract
Recent progress in image and video synthesis has inspired their use in advancing 3D scene generation. However, we observe that text-to-image and -video approaches struggle to maintain scene- and object-level consistency beyond a limited environment scale due to the absence of explicit geometry. We thus present a geometry-first approach that decouples this complex problem of large-scale 3D scene synthesis into its structural composition, represented as a mesh scaffold, and realistic appearance sy