The standard 'Chinchilla Approach 2' for fitting scaling laws is systematically biased, potentially leading to millions of dollars in wasted compute at frontier scales.
March 25, 2026
Original Paper
Problems with Chinchilla Approach 2: Systematic Biases in IsoFLOP Parabola Fits
arXiv · 2603.22339
The Takeaway
It identifies a numerical flaw in how the industry estimates compute-optimal model sizes and provides a more stable, unbiased alternative (Variable Projection). This is a critical correction for anyone planning large-scale training runs.
From the abstract
Chinchilla Approach 2 is among the most widely used methods for fitting neural scaling laws. Its parabolic approximation introduces systematic biases in compute-optimal allocation estimates, even on noise-free synthetic data. Applied to published Llama 3 IsoFLOP data at open frontier compute scales, these biases imply a parameter underallocation corresponding to 6.5% of the $3.8\times10^{25}$ FLOP training budget and \$1.4M (90% CI: \$412K-\$2.9M) in unnecessary compute at 50% H100 MFU. Simulate