AI & ML Breaks Assumption

Demonstrates that integer multiplication is not a long-range dependency problem, and that current architectures like Transformers and Mamba are fundamentally using the wrong 'computational spacetime.'

April 1, 2026

Original Paper

On the Mirage of Long-Range Dependency, with an Application to Integer Multiplication

Zichao Wei

arXiv · 2603.29069

The Takeaway

By reframing multiplication as a 2D local operation, a tiny 321-parameter model achieves perfect length generalization (683x training range) where massive Transformers fail. This suggests many 'hard' reasoning tasks may just be poorly represented for current sequence models.

From the abstract

Integer multiplication has long been considered a hard problem for neural networks, with the difficulty widely attributed to the O(n) long-range dependency induced by carry chains. We argue that this diagnosis is wrong: long-range dependency is not an intrinsic property of multiplication, but a mirage produced by the choice of computational spacetime. We formalize the notion of mirage and provide a constructive proof: when two n-bit binary integers are laid out as a 2D outer-product grid, every

Read the original paper →

← Back to today's papers