Discovers that as LLMs scale, their complex non-linear depth dynamics converge into accurate, low-order linear surrogates.
arXiv · March 16, 2026 · 2603.12541
Why it matters
This challenges the view of LLMs as strictly opaque black boxes by showing that larger models actually exhibit simpler, more predictable internal dynamics. This enables more efficient multi-layer interventions and energy-efficient control of model behavior.
From the abstract
Large language models are often viewed as high-dimensional nonlinear systems and treated as black boxes. Here, we show that transformer depth dynamics admit accurate low-order linear surrogates within context. Across tasks including toxicity, irony, hate speech and sentiment, a 32-dimensional linear surrogate reproduces the layerwise sensitivity profile of GPT-2-large with near-perfect agreement, capturing how the final output shifts under additive injections at each layer. We then uncover a sur