AI & ML Scaling Insight

Sensitivity to compression in Transformers spans five orders of magnitude, with early-layer MLP up-projections identified as catastrophic failure points.

March 24, 2026

Original Paper

Structural Sensitivity in Compressed Transformers: Error Propagation, Lyapunov Stability, and Formally Verified Bounds

Abhinaba Basu

arXiv · 2603.20991

The Takeaway

Practitioners performing quantization or pruning can use these formally verified bounds (Lean 4) to identify which specific matrices (like GPT-2's 468th) will tank performance. It provides a Compression Fragility Index to rank-order model robustness across architectures from 117M to 8B parameters.

From the abstract

A single matrix out of 468 in GPT-2 Small can increase perplexity by 20,000x when compressed, revealing that transformer compression sensitivity spans five orders of magnitude. We map this sensitivity landscape across five architectures (117M-8B parameters), finding a consistent hierarchy: early-layer MLP up-projections are catastrophically sensitive while value projections compress nearly for free. This hierarchy is stable across compression levels, evaluation scales (2K-51K tokens), and datase

Read the original paper →

← Back to today's papers