Stop guessing how many heads your Transformer needs; this model grows its own 'brain' based on the task's complexity.
April 14, 2026
Original Paper
INCRT: An Incremental Transformer That Determines Its Own Architecture
arXiv · 2604.10703
The Takeaway
The INCRT architecture starts with one head and uses a geometric quantity of task structure to prune or add heads during training. it eliminates manual architectural design by letting the model's size evolve based on the problem it's solving.
From the abstract
Transformer architectures are designed by trial and error: the number of attention heads, the depth, and the head size are fixed before training begins, with no mathematical principle to guide the choice. The result is systematic structural redundancy -- between half and four-fifths of all heads in a trained model can be removed without measurable loss -- because the architecture allocates capacity without reference to the actual requirements of thethis http URLpaper introduces INCRT (Incrementa