AI & ML Paradigm Shift

The 'routing paradox' proves that selective attention requires the very pairwise computations it aims to replace, explaining why pure recurrent models fail at associative recall.

March 24, 2026

Original Paper

When Does Content-Based Routing Work? Representation Requirements for Selective Attention in Hybrid Sequence Models

Abhinaba Basu

arXiv · 2603.20997

The Takeaway

It reframes attention as a representation constructor (writing pairwise matches into tokens) rather than just a computation mechanism. This result is critical for researchers designing hybrid architectures (like Mamba-Attention) as it maps exactly how many attention layers are needed to enable routing.

From the abstract

We identify a routing paradox in hybrid recurrent-attention architectures: content-based routing - deciding which tokens deserve expensive attention - requires exactly the pairwise computation that routing is designed to avoid. Through 20+ controlled experiments across three tasks (a synthetic diagnostic, the Zoology MQAR benchmark, and HotpotQA), we map the routing landscape exhaustively. One layer of softmax attention creates a latent ~34-dimensional subspace enabling 98.4% routing precision;