Proposes URDF-Anything+, an autoregressive framework that generates fully executable articulated 3D models from raw visual observations.
arXiv · March 17, 2026 · 2603.14010
The Takeaway
This allows for a 'Real-Follow-Sim' paradigm where digital twins of real-world articulated objects are created on-the-fly. It solves a major bottleneck in robotics: the need for manually designed kinematic models before a policy can be trained or tested in simulation.
From the abstract
Articulated objects are fundamental for robotics, simulation of physics, and interactive virtual environments. However, reconstructing them from visual input remains challenging, as it requires jointly inferring both part geometry and kinematic structure. We present, an end-to-end autoregressive framework that directly generates executable articulated object models from visual observations. Given image and object-level 3D cues, our method sequentially produces part geometries and their associate