AI & ML Nature Is Weird

Stop guessing how many heads your Transformer needs; this model grows its own 'brain' based on the task's complexity.

April 14, 2026

Original Paper

INCRT: An Incremental Transformer That Determines Its Own Architecture

Giansalvo Cirrincione

arXiv · 2604.10703

The Takeaway

The INCRT architecture starts with one head and uses a geometric quantity of task structure to prune or add heads during training. it eliminates manual architectural design by letting the model's size evolve based on the problem it's solving.

From the abstract

Transformer architectures are designed by trial and error: the number of attention heads, the depth, and the head size are fixed before training begins, with no mathematical principle to guide the choice. The result is systematic structural redundancy -- between half and four-fifths of all heads in a trained model can be removed without measurable loss -- because the architecture allocates capacity without reference to the actual requirements of thethis http URLpaper introduces INCRT (Incrementa