AI & ML Breaks Assumption

This theoretical work refutes the 'Garbage In, Garbage Out' mantra for modern ML, proving that high-dimensional model capacity can asymptotically overcome predictor error and structural uncertainty.

arXiv · March 16, 2026 · 2603.12288

Terrence J. Lee-St. John, Jordan L. Lawson, Bartlomiej Piechowski-Jozwiak

Why it matters

It provides a formal explanation for why modern models thrive on noisy, collinear tabular data. This insight helps practitioners understand when to focus on adding more features versus cleaning existing ones, grounded in Information Theory and Latent Factor Models.

From the abstract

Tabular machine learning presents a paradox: modern models achieve state-of-the-art performance using high-dimensional (high-D), collinear, error-prone data, defying the "Garbage In, Garbage Out" mantra. To help resolve this, we synthesize principles from Information Theory, Latent Factor Models, and Psychometrics, clarifying that predictive robustness arises not solely from data cleanliness, but from the synergy between data architecture and model capacity. Partitioning predictor-space "noise"

Read the original paper →

← Back to today's papers