Demonstrates that integer multiplication is not a long-range dependency problem, and that current architectures like Transformers and Mamba are fundamentally using the wrong 'computational spacetime.'
April 1, 2026
Original Paper
On the Mirage of Long-Range Dependency, with an Application to Integer Multiplication
arXiv · 2603.29069
The Takeaway
By reframing multiplication as a 2D local operation, a tiny 321-parameter model achieves perfect length generalization (683x training range) where massive Transformers fail. This suggests many 'hard' reasoning tasks may just be poorly represented for current sequence models.
From the abstract
Integer multiplication has long been considered a hard problem for neural networks, with the difficulty widely attributed to the O(n) long-range dependency induced by carry chains. We argue that this diagnosis is wrong: long-range dependency is not an intrinsic property of multiplication, but a mirage produced by the choice of computational spacetime. We formalize the notion of mirage and provide a constructive proof: when two n-bit binary integers are laid out as a 2D outer-product grid, every