Training an AI on messy, unbalanced data actually makes it smarter than using a perfectly curated dataset.
April 29, 2026
Original Paper
The Power of Power Law: Asymmetry Enables Compositional Reasoning
arXiv · 2604.22951
The Takeaway
Most engineers try to build training sets where every category is equally represented to avoid bias. This research shows that data following a power-law distribution is actually better for teaching complex reasoning. This imbalance acts as a catalyst that forces the model to learn how to compose different concepts together. Models trained this way outperformed those trained on uniform data across multiple logic tasks. This finding suggests that we should stop trying to balance datasets and embrace the natural unevenness of the world.
From the abstract
Natural language data follows a power-law distribution, with most knowledge and skills appearing at very low frequency. While a common intuition suggests that reweighting or curating data towards a uniform distribution may help models better learn these long-tail skills, we find a counterintuitive result: across a wide range of compositional reasoning tasks, such as state tracking and multi-step arithmetic, training under power-law distributions consistently outperforms training under uniform di