AI & ML New Capability

Proposes URDF-Anything+, an autoregressive framework that generates fully executable articulated 3D models from raw visual observations.

arXiv · March 17, 2026 · 2603.14010

Zhuangzhe Wu, Yue Xin, Chengkai Hou, Minghao Chen, Yaoxu Lyu, Jieyu Zhang, Shanghang Zhang

The Takeaway

This allows for a 'Real-Follow-Sim' paradigm where digital twins of real-world articulated objects are created on-the-fly. It solves a major bottleneck in robotics: the need for manually designed kinematic models before a policy can be trained or tested in simulation.

From the abstract

Articulated objects are fundamental for robotics, simulation of physics, and interactive virtual environments. However, reconstructing them from visual input remains challenging, as it requires jointly inferring both part geometry and kinematic structure. We present, an end-to-end autoregressive framework that directly generates executable articulated object models from visual observations. Given image and object-level 3D cues, our method sequentially produces part geometries and their associate