AI & ML Breaks Assumption

Proves that rotation-invariant algorithms like standard Gradient Descent are fundamentally suboptimal for sparse targets when trained on hard labels.

March 24, 2026

Original Paper

Hard labels sampled from sparse targets mislead rotation invariant algorithms

Avrajit Ghosh, Bin Yu, Manfred Warmuth, Peter Bartlett

arXiv · 2603.20967

The Takeaway

This challenges the default use of rotation-invariant neural architectures and standard SGD for sparse classification tasks (like tabular data). It provides a theoretical explanation for why non-rotation-invariant methods (like decision trees) often outperform deep learning in sparse regimes, suggesting a need for a shift in how we design optimizers and architectures for sparse data.

From the abstract

One of the most common machine learning setups is logistic regression. In many classification models, including neural networks, the final prediction is obtained by applying a logistic link function to a linear score. In binary logistic regression, the feedback can be either soft labels, corresponding to the true conditional probability of the data (as in distillation), or sampled hard labels (taking values $\pm 1$). We point out a fundamental problem that arises even in a particularly favorable