AI & ML Efficiency Breakthrough

Recovers short-text performance in context-extended LLMs using 60x less data than current state-of-the-art distillation methods.

April 2, 2026

Original Paper

LinearARD: Linear-Memory Attention Distillation for RoPE Restoration

Ning Yang, Hengyu Zhong, Wentao Wang, Baoliang Tian, Haijun Zhang, Jun Wang

arXiv · 2604.00004

AI-generated illustration

The Takeaway

Context extension typically degrades short-context performance. This paper shows how to restore those capabilities by aligning attention distributions using a linear-memory kernel, requiring only 4M tokens compared to the standard 256M.

From the abstract

The extension of context windows in Large Language Models is typically facilitated by scaling positional encodings followed by lightweight Continual Pre-Training (CPT). While effective for processing long sequences, this paradigm often disrupts original model capabilities, leading to performance degradation on standard short-text benchmarks. We propose LinearARD, a self-distillation method that restores Rotary Position Embeddings (RoPE)-scaled students through attention-structure consistency wit