MotionAnymesh automatically transforms static 3D meshes into simulation-ready, articulated digital twins for robotics using vision-language models grounded in physical priors.
arXiv · March 16, 2026 · 2603.12936
Why it matters
Converting static assets to interactable ones is a massive bottleneck in robotics simulation; this zero-shot framework eliminates manual rigging and avoids mesh inter-penetration during simulation, accelerating the creation of training environments for embodied AI.
From the abstract
Converting static 3D meshes into interactable articulated assets is crucial for embodied AI and robotic simulation. However, existing zero-shot pipelines struggle with complex assets due to a critical lack of physical grounding. Specifically, ungrounded Vision-Language Models (VLMs) frequently suffer from kinematic hallucinations, while unconstrained joint estimation inevitably leads to catastrophic mesh inter-penetration during physical simulation. To bridge this gap, we propose MotionAnymesh,