AI & ML Paradigm Shift

Formally proves that a causal Transformer is mathematically equivalent to a stateless Differentiable Neural Computer.

March 23, 2026

Original Paper

Transformers are Stateless Differentiable Neural Computers

Bo Tang, Weiwei Xie

arXiv · 2603.19272

The Takeaway

This unifies two major branches of ML architecture, providing a rigorous 'memory-centric' interpretation of self-attention. It allows researchers to apply decades of research on addressable memory and neural computers directly to the optimization of Transformers.

From the abstract

Differentiable Neural Computers (DNCs) were introduced as recurrent architectures equipped with an addressable external memory supporting differentiable read and write operations. Transformers, in contrast, are nominally feedforward architectures based on multi-head self-attention. In this work we give a formal derivation showing that a causal Transformer layer is exactly a stateless Differentiable Neural Computer (sDNC) where (1) the controller has no recurrent internal state, (2) the external

Read the original paper →

← Back to today's papers