AI & ML Paradigm Challenge

The most famous rule in AI training is actually wrong because it ignores how much it costs to keep the lights on once the model is built.

April 3, 2026

Original Paper

Test-Time Scaling Makes Overtraining Compute-Optimal

Nicholas Roberts, Sungjun Cho, Zhiqi Gao, Tzu-Heng Huang, Albert Wu, Gabriel Orlanski, Avi Trost, Kelly Buchanan, Aws Albarghouthi, Frederic Sala

arXiv · 2604.01411

The Takeaway

By accounting for inference costs, researchers found that 'overtraining' models far beyond their theoretical limits is actually the most efficient path. This discovery could shift how trillions of dollars are invested in future AI development.

From the abstract

Modern LLMs scale at test-time, e.g. via repeated sampling, where inference cost grows with model size and the number of samples. This creates a trade-off that pretraining scaling laws, such as Chinchilla, do not address. We present Train-to-Test ($T^2$) scaling laws that jointly optimize model size, training tokens, and number of inference samples under fixed end-to-end budgets. $T^2$ modernizes pretraining scaling laws with pass@$k$ modeling used for test-time scaling, then jointly optimizes p

Read the original paper →

← Back to today's papers