AI & ML Efficiency Breakthrough

Hybrid Associative Memory (HAM) layers allow the KV cache to grow dynamically based only on information that an internal RNN cannot predict.

March 25, 2026

Original Paper

Hybrid Associative Memories

Leon Lufkin, Tomás Figliolia, Beren Millidge, Kamesh Krishnamurthy

arXiv · 2603.22325

The Takeaway

It solves the KV cache memory explosion by selectively storing tokens. Users can tune a single threshold to trade off memory usage and performance, providing a much more flexible alternative to fixed-window or purely linear attention models.

From the abstract

Recurrent neural networks (RNNs) and self-attention are both widely used sequence-mixing layers that maintain an internal memory. However, this memory is constructed using two orthogonal mechanisms: RNNs compress the entire past into a fixed-size state, whereas self-attention's state stores every past time step growing its state (the KV cache) linearly with the sequence length. This results in orthogonal strengths and weaknesses. Self-attention layers excel at retrieving information in the conte