AI & ML Paradigm Shift

Enables zero-shot humanoid robot interaction by generating robot-centric 'dream' videos instead of relying on human-to-robot motion retargeting.

March 23, 2026

Original Paper

Morphology-Consistent Humanoid Interaction through Robot-Centric Video Synthesis

Weisheng Xu, Jian Li, Yi Gu, Bin Yang, Haodong Chen, Shuyi Lin, Mingqian Zhou, Jing Tan, Qiwei Wu, Xiangrui Jiang, Taowen Wang, Jiawen Wen, Qiwei Liang, Jiaxi Zhang, Renjing Xu

arXiv · 2603.19709

The Takeaway

Traditional retargeting fails due to the morphology gap between humans and robots; Dream2Act sidesteps this by using generative models to envision the robot's own body performing tasks. This approach achieves a 37.5% success rate on tasks where standard retargeting fails completely, offering a more scalable path for whole-body humanoid control.

From the abstract

Equipping humanoid robots with versatile interaction skills typically requires either extensive policy training or explicit human-to-robot motion retargeting. However, learning-based policies face prohibitive data collection costs. Meanwhile, retargeting relies on human-centric pose estimation (e.g., SMPL), introducing a morphology gap. Skeletal scale mismatches result in severe spatial misalignments when mapped to robots, compromising interaction success. In this work, we propose Dream2Act, a r