AI & ML Scaling Insight

Shows that 'Mid-Training' on high-quality reasoning data is the primary driver of model capability, whereas RL only succeeds as a sparse refinement step.

March 19, 2026

Original Paper

PRISM: Demystifying Retention and Interaction in Mid-Training

Bharat Runwal, Ashish Agrawal, Anurag Roy, Rameswar Panda

arXiv · 2603.17074

The Takeaway

PRISM proves that math and code gains (up to 40 points) are won during a 27B token mid-training phase that restructures 90% of weights. This challenges the 'RL is magic' narrative, showing RL's success is entirely dependent on the representational geometry established in mid-training.

From the abstract

We present PRISM, a comprehensive empirical study of mid-training design choices for large language models. Through controlled experiments across seven base models spanning four families (Granite, LLaMA, Mistral, Nemotron-H), two architecture types (dense Transformer and attention-Mamba hybrid), and scales from 3B to 24B parameters, we show that mid-training on approximately 27B high-quality tokens yields consistent gains of +15 to +40 points on math, +5 to +12 points on code, and +6 to +13 poin

Read the original paper →

← Back to today's papers