AI & ML Efficiency Breakthrough

A training-free visual token pruning framework for Large Vision-Language Models that preserves geometric structure through subspace reconstruction.

March 24, 2026

Original Paper

ResPrune: Text-Conditioned Subspace Reconstruction for Visual Token Pruning in Large Vision-Language Models

Xu Li, Yi Zheng, Yuxuan Liang, Zhe Liu, Xiaolei Chen, Haotian Chen, Rui Zhu, Xiangyang Xue

arXiv · 2603.21105

The Takeaway

Practitioners can immediately reduce KV-cache memory and inference latency in models like Qwen2.5-VL or LLaVA without any retraining. It uses a lightweight greedy expansion strategy to retain only the most informative visual tokens relative to the text prompt.

From the abstract

Large Vision-Language Models (LVLMs) rely on dense visual tokens to capture fine-grained visual information, but processing all these tokens incurs substantial computational and memory overhead during inference. To address this issue, we propose ResPrune, a training-free visual token pruning framework that enables efficient LVLM inference by selecting a compact yet informative subset of visual tokens. ResPrune formulates visual token pruning as a subspace reconstruction problem and employs a gre