AI & ML Scaling Insight

Uses the Minimum Description Length principle to predict exactly when neural networks will transition from simple 'spurious' shortcuts to complex features.

March 30, 2026

Original Paper

A Compression Perspective on Simplicity Bias

Tom Marty, Eric Elmoznino, Leo Gagnon, Tejas Kasetty, Mizu Nishikawa-Toomey, Sarthak Mittal, Guillaume Lajoie, Dhanya Sridhar

arXiv · 2603.25839

The Takeaway

It provides a theoretical 'map' for practitioners to understand how data volume influences feature selection. This helps in diagnosing why models fail (by choosing 'simple' but wrong features) and how much data is needed to force the learning of robust cues.

From the abstract

Deep neural networks exhibit a simplicity bias, a well-documented tendency to favor simple functions over complex ones. In this work, we cast new light on this phenomenon through the lens of the Minimum Description Length principle, formalizing supervised learning as a problem of optimal two-part lossless compression. Our theory explains how simplicity bias governs feature selection in neural networks through a fundamental trade-off between model complexity (the cost of describing the hypothesis