Foundational deep networks consistently assign higher density to simpler images, regardless of training data or architecture complexity.
April 2, 2026
Original Paper
Deep Networks Favor Simple Data
arXiv · 2604.00394
The Takeaway
Challenges the assumption that likelihood indicates 'typicality.' Shows that models trained even on single complex samples still prefer simple OOD data (like SVHN over CIFAR-10), pointing to a universal architectural bias toward low-complexity signals.
From the abstract
Estimated density is often interpreted as indicating how typical a sample is under a model. Yet deep models trained on one dataset can assign \emph{higher} density to simpler out-of-distribution (OOD) data than to in-distribution test data. We refer to this behavior as the OOD anomaly. Prior work typically studies this phenomenon within a single architecture, detector, or benchmark, implicitly assuming certain canonical densities. We instead separate the trained network from the density estimato