AI & ML New Capability

Automates the entire robot training pipeline by using video generation models as motion priors to synthesize both simulation environments and expert trajectories.

arXiv · March 20, 2026 · 2603.18811

Songjia He, Zixuan Chen, Hongyu Ding, Dian Shao, Jieqi Shi, Chenxu Li, Jing Huo, Yang Gao

The Takeaway

This framework eliminates the need for manual asset curation and heuristic-based motion planning in robotics. By leveraging the rich priors of video models, it generates executable expert data from natural language, facilitating zero-shot sim-to-real transfer for novel objects.

From the abstract

Training generalist robots demands large-scale, diverse manipulation data, yet real-world collection is prohibitively expensive, and existing simulators are often constrained by fixed asset libraries and manual heuristics. To bridge this gap, we present V-Dreamer, a fully automated framework that generates open-vocabulary, simulation-ready manipulation environments and executable expert trajectories directly from natural language instructions. V-Dreamer employs a novel generative pipeline that c