Shows that 'Mid-Training' on high-quality reasoning data is the primary driver of model capability, whereas RL only succeeds as a sparse refinement step.
March 19, 2026
Original Paper
PRISM: Demystifying Retention and Interaction in Mid-Training
arXiv · 2603.17074
The Takeaway
PRISM proves that math and code gains (up to 40 points) are won during a 27B token mid-training phase that restructures 90% of weights. This challenges the 'RL is magic' narrative, showing RL's success is entirely dependent on the representational geometry established in mid-training.
From the abstract
We present PRISM, a comprehensive empirical study of mid-training design choices for large language models. Through controlled experiments across seven base models spanning four families (Granite, LLaMA, Mistral, Nemotron-H), two architecture types (dense Transformer and attention-Mamba hybrid), and scales from 3B to 24B parameters, we show that mid-training on approximately 27B high-quality tokens yields consistent gains of +15 to +40 points on math, +5 to +12 points on code, and +6 to +13 poin