AI & ML Scaling Insight

The 'Progressive Intensity Hypothesis' establishes that weaker perturbations (pruning) should precede stronger ones (quantization) for optimal joint model compression.

March 20, 2026

Original Paper

Prune-then-Quantize or Quantize-then-Prune? Understanding the Impact of Compression Order in Joint Model Compression

Minjun Kim, Jaehyeon Choi, Hyunwoo Yang, Jongjin Kim, Jinho Song, U Kang

arXiv · 2603.18426

The Takeaway

It provides a clear, theoretically-backed answer to a common practitioner question: what is the optimal sequence for multi-stage model compression? The findings hold across both vision and language models, offering a standard pipeline for high-efficiency deployment.

From the abstract

What happens when multiple compression methods are combined-does the order in which they are applied matter? Joint model compression has emerged as a powerful strategy to achieve higher efficiency by combining multiple methods such as pruning and quantization. A central but underexplored factor in joint model compression is the compression order, or the sequence of different methods within the compression pipeline. Most prior studies have either sidestepped the issue by assuming orthogonality be