AI learning concentrates almost entirely in the final layers of a network regardless of the model size or architecture.
April 23, 2026
Original Paper
Decomposing the Depth Profile of Fine-Tuning
arXiv · 2604.17177
The Takeaway
Fine-tuning a model doesn't change the whole brain, just the part closest to the output. This concentration persists even when the math is adjusted to force changes in the middle. It suggests that deep networks have an inherent structural bias toward keeping their core representations fixed. This locality gradient means that most of the intelligence we see in fine-tuned models is just a surface-level remapping. We are mostly teaching models how to speak, not how to think differently. Understanding this bias allows us to design more efficient training methods that target the layers that actually matter.
From the abstract
Fine-tuning adapts pretrained networks to new objectives. Whether the resulting depth profile of representational change reflects an intrinsic property of the model or the magnitude of gradient flow has not been tested directly. We measure this profile across 240 fine-tuning runs spanning 15 models in four architecture families (encoder and decoder transformers, a state-space model, and an RNN) at scales from 125M to 6.9B parameters. Representational change concentrates in output-proximal layers