AI & ML Scaling Insight

Establishes a three-dimensional scaling law for RAG-pretraining, modeling the optimal data budget allocation between model parameters, tokens, and retrieval store size.

April 2, 2026

Original Paper

To Memorize or to Retrieve: Scaling Laws for RAG-Considerate Pretraining

Karan Singh, Michael Yu, Varun Gangal, Zhuofu Tao, Sachin Kumar, Emmy Liu, Steven Y. Feng

arXiv · 2604.00715

The Takeaway

It provides the first quantitative framework for deciding when to stop pretraining and start investing in a larger retrieval corpus. This is critical for practitioners building knowledge-intensive systems who need to balance compute costs with inference-time accuracy.

From the abstract

Retrieval-augmented generation (RAG) improves language model (LM) performance by providing relevant context at test time for knowledge-intensive situations. However, the relationship between parametric knowledge acquired during pretraining and non-parametric knowledge accessed via retrieval remains poorly understood, especially under fixed data budgets. In this work, we systematically study the trade-off between pretraining corpus size and retrieval store size across a wide range of model and da