AI & ML Scaling Insight

Proves the existence of a 'distributional simplicity bias' in diffusion models, where low-order statistics are learned linearly while high-order correlations require cubic sample complexity.

arXiv · March 16, 2026 · 2603.12901

Lorenzo Bardone, Claudia Merger, Sebastian Goldt

Why it matters

It provides a theoretical explanation for the data requirements of generative models, showing exactly how and when models transition from learning simple patterns to complex textures.

From the abstract

While diffusion models have emerged as a powerful class of generative models, their learning dynamics remain poorly understood. We address this issue first by empirically showing that standard diffusion models trained on natural images exhibit a distributional simplicity bias, learning simple, pair-wise input statistics before specializing to higher-order correlations. We reproduce this behaviour in simple denoisers trained on a minimal data model, the mixed cumulant model, where we precisely co