AI & ML Efficiency Breakthrough

A vector-wise sparse attention mechanism that accelerates long-context video inference by 2.6x with zero loss in accuracy.

April 1, 2026

Original Paper

VecAttention: Vector-wise Sparse Attention for Accelerating Long Context Inference

Anmin Liu, Ruixuan Yang, Huiqiang Jiang, Bin Lin, Minmin Sun, Yong Li, Chen Zhang, Tao Xie

arXiv · 2603.29494

The Takeaway

By identifying a unique 'vertical-vector' sparsity pattern in video attention maps, it avoids the redundant computation of coarse-grained sparse patterns used in standard models. It provides a practical path to scaling video transformers to massive context lengths without the usual quadratic memory overhead.

From the abstract

Long-context video understanding and generation pose a significant computational challenge for Transformer-based video models due to the quadratic complexity of self-attention. While existing sparse attention methods employ coarse-grained patterns to improve efficiency, they typically incur redundant computation and suboptimal performance. To address this issue, in this paper, we propose \textbf{VecAttention}, a novel framework of vector-wise sparse attention that achieves superior accuracy-effi