AI & ML Breaks Assumption

Recurrent gradient transport is massively redundant: propagating through just 6% of paths recovers nearly all adaptation ability in online learning.

arXiv · March 17, 2026 · 2603.15195

Aur Shalev Merin

The Takeaway

It challenges the conventional wisdom that exact gradients are required for effective online learning in RNNs and Transformers. This 'selection-invariant' redundancy paves the way for ultra-sparse, efficient online learning on edge devices where full Jacobian propagation is too expensive.

From the abstract

Real-time recurrent learning (RTRL) computes exact online gradients by propagating a Jacobian tensor forward through recurrent dynamics, but at O(n^4) cost per step. Prior work has sought structured approximations (rank-1 compression, graph-based sparsity, Kronecker factorization). We show that, in the continuous error signal regime, the recurrent Jacobian is massively redundant:propagating through a random 6% of paths (k=4 of n=64) recovers 84 +/- 6% of full RTRL's adaptation ability across fiv