AI & ML Efficiency Breakthrough

ActTail achieves 80% activation sparsity in LLMs with significantly lower perplexity degradation than uniform methods by using Heavy-Tailed Self-Regularization theory.

arXiv · March 16, 2026 · 2603.12272

Wenwen Hou, Xinyuan Song, Shiwei Liu

AI-generated illustration

Why it matters

This provides a principled, non-heuristic way to allocate sparsity budgets across Transformer layers. For practitioners, this means significantly reduced memory and compute during inference without the usual catastrophic performance drop-off associated with high sparsity.

From the abstract

Activation sparsity is a promising approach for accelerating large language model (LLM) inference by reducing computation and memory movement. However, existing activation sparsity methods typically apply uniform sparsity across projections, ignoring the heterogeneous statistical properties of Transformer weights and thereby amplifying performance degradation. In this paper, we propose ActTail, a TopK magnitude-based activation sparsity method with global activation sparsity allocation grounded