Formally proves that a causal Transformer is mathematically equivalent to a stateless Differentiable Neural Computer.
March 23, 2026
Original Paper
Transformers are Stateless Differentiable Neural Computers
arXiv · 2603.19272
The Takeaway
This unifies two major branches of ML architecture, providing a rigorous 'memory-centric' interpretation of self-attention. It allows researchers to apply decades of research on addressable memory and neural computers directly to the optimization of Transformers.
From the abstract
Differentiable Neural Computers (DNCs) were introduced as recurrent architectures equipped with an addressable external memory supporting differentiable read and write operations. Transformers, in contrast, are nominally feedforward architectures based on multi-head self-attention. In this work we give a formal derivation showing that a causal Transformer layer is exactly a stateless Differentiable Neural Computer (sDNC) where (1) the controller has no recurrent internal state, (2) the external