AI & ML Scaling Insight

Establishes scaling laws for sampling compute in LLM Reinforcement Learning, providing a playbook for optimal parallel rollout and batch allocation.

arXiv · March 13, 2026 · 2603.12151

Zhoujun Cheng, Yutao Xie, Yuxiao Qu, Amrith Setlur, Shibo Hao, Varad Pimpalkhute, Tongtong Liang, Feng Yao, Zhengzhong Liu, Eric Xing, Virginia Smith, Ruslan Salakhutdinov, Zhiting Hu, Taylor Killian, Aviral Kumar

Why it matters

As RL post-training (e.g., DeepSeek-R1 style) becomes standard, this paper provides the necessary prescriptive rules for compute-efficient training. It explains how to scale parallel rollouts to balance solution sharpening on easy tasks versus coverage on hard ones.

From the abstract

While scaling laws guide compute allocation for LLM pre-training, analogous prescriptions for reinforcement learning (RL) post-training of large language models (LLMs) remain poorly understood. We study the compute-optimal allocation of sampling compute for on-policy RL methods in LLMs, framing scaling as a compute-constrained optimization over three resources: parallel rollouts per problem, number of problems per batch, and number of update steps. We find that the compute-optimal number of para