AI & ML Paradigm Shift

A linear-time attention mechanism that is weight-compatible with standard pretrained Transformers, allowing for direct knowledge transfer.

March 20, 2026

Original Paper

MANAR: Memory-augmented Attention with Navigational Abstract Conceptual Representation

Zuher Jahshan, Ben Ben Ishay, Leonid Yavits

arXiv · 2603.18676

The Takeaway

Most linear attention alternatives require expensive retraining from scratch. By using a 'weight-copy' approach from Multi-Head Attention, this framework allows practitioners to upgrade existing quadratic-complexity models to linear-time scaling without losing their pre-learned knowledge.

From the abstract

MANAR (Memory-augmented Attention with Navigational Abstract Conceptual Representation), contextualization layer generalizes standard multi-head attention (MHA) by instantiating the principles of Global Workspace Theory (GWT). While MHA enables unconstrained all-to-all communication, it lacks the functional bottleneck and global integration mechanisms hypothesized in cognitive models of consciousness. MANAR addresses this by implementing a central workspace through a trainable memory of abstract