Demonstrates that massive scaling of diverse simulator resets can replace manual curriculum engineering for complex dexterous manipulation.
arXiv · March 18, 2026 · 2603.15789
The Takeaway
It shows that long-horizon, contact-rich robot tasks can be solved without human demonstrations or complex reward shaping. This shifts the focus from task-specific engineering to programmatic data coverage in simulation.
From the abstract
Reinforcement learning in massively parallel physics simulations has driven major progress in sim-to-real robot learning. However, current approaches remain brittle and task-specific, relying on extensive per-task engineering to design rewards, curricula, and demonstrations. Even with this engineering, they often fail on long-horizon, contact-rich manipulation tasks and do not meaningfully scale with compute, as performance quickly saturates when training revisits the same narrow regions of stat