AI & ML Efficiency Breakthrough

CleanSight provides a training-free, test-time defense for backdoored vision-language models by detecting and pruning 'attention stealing' visual tokens.

arXiv · March 16, 2026 · 2603.12989

Zhifang Zhang, Bojun Yang, Shuo He, Weitong Chen, Wei Emma Zhang, Olaf Maennel, Lei Feng, Miao Xu

Why it matters

Defending against backdoors usually requires expensive fine-tuning on clean data; this method operates purely at inference time by identifying abnormal cross-modal attention patterns, offering a plug-and-play security layer for deployed VLMs.

From the abstract

Despite the strong multimodal performance, large vision-language models (LVLMs) are vulnerable during fine-tuning to backdoor attacks, where adversaries insert trigger-embedded samples into the training data to implant behaviors that can be maliciously activated at test time. Existing defenses typically rely on retraining backdoored parameters (e.g., adapters or LoRA modules) with clean data, which is computationally expensive and often degrades model performance. In this work, we provide a new