Enables zero-shot humanoid robot interaction by generating robot-centric 'dream' videos instead of relying on human-to-robot motion retargeting.
March 23, 2026
Original Paper
Morphology-Consistent Humanoid Interaction through Robot-Centric Video Synthesis
arXiv · 2603.19709
The Takeaway
Traditional retargeting fails due to the morphology gap between humans and robots; Dream2Act sidesteps this by using generative models to envision the robot's own body performing tasks. This approach achieves a 37.5% success rate on tasks where standard retargeting fails completely, offering a more scalable path for whole-body humanoid control.
From the abstract
Equipping humanoid robots with versatile interaction skills typically requires either extensive policy training or explicit human-to-robot motion retargeting. However, learning-based policies face prohibitive data collection costs. Meanwhile, retargeting relies on human-centric pose estimation (e.g., SMPL), introducing a morphology gap. Skeletal scale mismatches result in severe spatial misalignments when mapped to robots, compromising interaction success. In this work, we propose Dream2Act, a r