Standard quantization destroys the small parameter 'deltas' that encode post-training knowledge; Delta-Aware Quantization (DAQ) fixes this by optimizing for sign preservation.
March 25, 2026
Original Paper
DAQ: Delta-Aware Quantization for Post-Training LLM Weight Compression
arXiv · 2603.22324
The Takeaway
Post-training quantization (PTQ) usually focuses on weight reconstruction, which ignores the directional fidelity of fine-tuning. DAQ allows practitioners to compress fine-tuned LLMs to FP8/INT8 without losing the specific style or capabilities gained during SFT.
From the abstract
We introduce Delta-Aware Quantization (DAQ), a data-free post-training quantization framework that preserves the knowledge acquired during post-training. Standard quantization objectives minimize reconstruction error but are agnostic to the base model, allowing quantization noise to disproportionately corrupt the small-magnitude parameter deltas ($\Delta W$) that encode post-training behavior -- an effect we analyze through the lens of quantization as implicit regularization. DAQ replaces recons