AI & ML Paradigm Challenge

Two models with the same training loss can have completely different intelligence levels.

April 14, 2026

Original Paper

Nexus: Same Pretraining Loss, Better Downstream Generalization via Common Minima

arXiv · 2604.09258

The Takeaway

The paper shows that 'common minima' across data sources, not just the loss value, determine generalization. It challenges the industry-standard belief that minimizing training loss is the ultimate proxy for model quality.

From the abstract

Pretraining is the cornerstone of Large Language Models (LLMs), dominating the vast majority of computational budget and data to serve as the primary engine for their capabilities. During pretraining, LLMs acquire foundational knowledge from an unprecedentedly massive and diverse data sources, encompassing a vast array of domains such as general language, mathematics, code, and complex reasoning. In this work, we investigate an interesting geometric question regarding the converged state of pret