Replaces standard relative Softmax attention with 'Multiscreening' to allow absolute query-key relevance, yielding 3.2x faster inference at 100K context.
April 2, 2026
Original Paper
Screening Is Enough
arXiv · 2604.01178
The Takeaway
By removing global competition between keys and using an explicit relevance threshold, it solves the 'relative score' problem of attention, enabling massive memory savings and more stable long-context retrieval.
From the abstract
A core limitation of standard softmax attention is that it does not define a notion of absolute query--key relevance: attention weights are obtained by redistributing a fixed unit mass across all keys according to their relative scores. As a result, relevance is defined only relative to competing keys, and irrelevant keys cannot be explicitly rejected. We introduce Multiscreen, a language-model architecture built around a mechanism we call screening, which enables absolute query--key relevance.