AI & ML Breaks Assumption

Identifies 'ghosts of softmax'—complex singularities that cap the Taylor convergence radius of cross-entropy loss—explaining why models collapse at specific step sizes.

arXiv · March 17, 2026 · 2603.13552

Piyush Sao

The Takeaway

It moves optimization theory beyond real-line curvature by proving that nearest complex zeros of the partition function dictate the safe step size limit. This allows practitioners to derive a 'safe' step size controller using a single Jacobian-vector product, preventing training divergence in high-learning-rate regimes.

From the abstract

Optimization analyses for cross-entropy training rely on local Taylor models of the loss to predict whether a proposed step will decrease the objective. These surrogates are reliable only inside the Taylor convergence radius of the true loss along the update direction. That radius is set not by real-line curvature alone but by the nearest complex singularity. For cross-entropy, the softmax partition function $F=\sum_j \exp(z_j)$ has complex zeros -- ``ghosts of softmax'' -- that induce logarithm

Read the original paper →

← Back to today's papers