Introduces a method to give frozen LLMs persistent memory in their continuous latent space, bypassing the need for text-level RAG or retraining.
arXiv · March 18, 2026 · 2603.16413
The Takeaway
It demonstrates that information can be written to and read from a model's latent space differentiably during inference. This enables 'conversational learning' in frozen models without gradients, potentially replacing bulky external text memories with compact numerical arrays.
From the abstract
Frozen encoder--decoder language models are stateless: the latent representation is discarded after every forward pass, so no information persists across sessions. This paper presents a \textbf{proof-of-concept pilot study} showing that persistent memory in the \emph{continuous latent space} of a frozen LLM is feasible -- even under severe resource constraints (a single frozen Flan-T5-XL backbone, small trainable adapters, a single dataset). We implement six architectural methods spanning three