AI & ML Scaling Insight

The standard 'Chinchilla Approach 2' for fitting scaling laws is systematically biased, potentially leading to millions of dollars in wasted compute at frontier scales.

March 25, 2026

Original Paper

Problems with Chinchilla Approach 2: Systematic Biases in IsoFLOP Parabola Fits

Eric Czech, Zhiwei Xu, Yael Elmatad, Yixin Wang, William Held

arXiv · 2603.22339

The Takeaway

It identifies a numerical flaw in how the industry estimates compute-optimal model sizes and provides a more stable, unbiased alternative (Variable Projection). This is a critical correction for anyone planning large-scale training runs.

From the abstract

Chinchilla Approach 2 is among the most widely used methods for fitting neural scaling laws. Its parabolic approximation introduces systematic biases in compute-optimal allocation estimates, even on noise-free synthetic data. Applied to published Llama 3 IsoFLOP data at open frontier compute scales, these biases imply a parameter underallocation corresponding to 6.5% of the $3.8\times10^{25}$ FLOP training budget and \$1.4M (90% CI: \$412K-\$2.9M) in unnecessary compute at 50% H100 MFU. Simulate

Read the original paper →

← Back to today's papers