Sensitivity to compression in Transformers spans five orders of magnitude, with early-layer MLP up-projections identified as catastrophic failure points.
March 24, 2026
Original Paper
Structural Sensitivity in Compressed Transformers: Error Propagation, Lyapunov Stability, and Formally Verified Bounds
arXiv · 2603.20991
The Takeaway
Practitioners performing quantization or pruning can use these formally verified bounds (Lean 4) to identify which specific matrices (like GPT-2's 468th) will tank performance. It provides a Compression Fragility Index to rank-order model robustness across architectures from 117M to 8B parameters.
From the abstract
A single matrix out of 468 in GPT-2 Small can increase perplexity by 20,000x when compressed, revealing that transformer compression sensitivity spans five orders of magnitude. We map this sensitivity landscape across five architectures (117M-8B parameters), finding a consistent hierarchy: early-layer MLP up-projections are catastrophically sensitive while value projections compress nearly for free. This hierarchy is stable across compression levels, evaluation scales (2K-51K tokens), and datase