This paper provides theoretical proof that autocurriculum—where a model selects its own training problems—requires exponentially fewer reasoning demonstrations.
March 20, 2026
Original Paper
Learning to Reason with Curriculum I: Provable Benefits of Autocurriculum
arXiv · 2603.18325
The Takeaway
It challenges the brute-force approach to scaling Chain-of-Thought (CoT) reasoning data. By focusing teacher supervision only where the model struggles, it decouples computational cost from the quality of the reference teacher, drastically reducing the data threshold for high-accuracy reasoning.
From the abstract
Chain-of-thought reasoning, where language models expend additional computation by producing thinking tokens prior to final responses, has driven significant advances in model capabilities. However, training these reasoning models is extremely costly in terms of both data and compute, as it involves collecting long traces of reasoning behavior from humans or synthetic generators and further post-training the model via reinforcement learning. Are these costs fundamental, or can they be reduced th