AI & ML Breaks Assumption

Reveals that linearized attention never converges to the NTK limit in practice, explaining its unique 'influence malleability' compared to standard networks.

arXiv · March 16, 2026 · 2603.13085

Jose Marie Antonio Miñoza, Paulo Mario P. Medina, Sebastian C. Ibañez

Why it matters

It challenges the conventional use of kernel frameworks to explain attention, showing that its non-convergence is actually the source of its power and its specific vulnerability to training-time adversarial attacks.

From the abstract

Understanding the theoretical foundations of attention mechanisms remains challenging due to their complex, non-linear dynamics. This work reveals a fundamental trade-off in the learning dynamics of linearized attention. Using a linearized attention mechanism with exact correspondence to a data-dependent Gram-induced kernel, both empirical and theoretical analysis through the Neural Tangent Kernel (NTK) framework shows that linearized attention does not converge to its infinite-width NTK limit,

Read the original paper →

← Back to today's papers