The 'Progressive Intensity Hypothesis' establishes that weaker perturbations (pruning) should precede stronger ones (quantization) for optimal joint model compression.
March 20, 2026
Original Paper
Prune-then-Quantize or Quantize-then-Prune? Understanding the Impact of Compression Order in Joint Model Compression
arXiv · 2603.18426
The Takeaway
It provides a clear, theoretically-backed answer to a common practitioner question: what is the optimal sequence for multi-stage model compression? The findings hold across both vision and language models, offering a standard pipeline for high-efficiency deployment.
From the abstract
What happens when multiple compression methods are combined-does the order in which they are applied matter? Joint model compression has emerged as a powerful strategy to achieve higher efficiency by combining multiple methods such as pruning and quantization. A central but underexplored factor in joint model compression is the compression order, or the sequence of different methods within the compression pipeline. Most prior studies have either sidestepped the issue by assuming orthogonality be