AI & ML Efficiency Breakthrough

The first open recipe for training embodied intelligence at the 1,000-GPU scale, achieving a 40x speedup in training cycles for GR00T models.

arXiv · March 13, 2026 · 2603.11101

Chen Zhou, Haoran Sun, Hedan Yang, Jing Long, Junwu Xiong, Luqiao Wang, Mingxi Luo, Qiming Yang, Shuai Di, Song Wang, Tianyun Zhao, Wanting Xu, Wen Huang, Xiaodong Bai, Xiaomeng Tian, Xiaolong Xiang, Yicheng Gong, Yongjian Guo, Yucheng Guo, Yunxuan Ma, Yu Wei, Zhong Guan, Zhen Sun

Why it matters

This work provides a complete infrastructure stack for scaling robotics foundation models, including optimizations like variable-length FlashAttention and data packing. Reducing a training round from 15 hours to 22 minutes radically changes the iteration speed for developing general-purpose robots.

From the abstract

Embodied intelligence is a key step towards Artificial General Intelligence (AGI), yet its development faces multiple challenges including data, frameworks, infrastructure, and evaluation systems. To address these issues, we have, for the first time in the industry, launched a cloud-based, thousand-GPU distributed training platform for embodied intelligence, built upon the widely adopted LeRobot framework, and have systematically overcome bottlenecks across the entire pipeline. At the data layer

Read the original paper →

← Back to today's papers