Identifies that 'attention imbalance' across modalities and tokens drives object hallucinations and proposes a decoding-time rectification (AIR) to fix it.
March 26, 2026
Original Paper
Mitigating Object Hallucinations in LVLMs via Attention Imbalance Rectification
arXiv · 2603.24058
The Takeaway
This is a lightweight, training-free intervention that reduces hallucination rates in Vision-Language Models by up to 35%. It provides practitioners a way to improve the reliability of deployed multimodal models in high-stakes scenarios like medical imaging or autonomous driving without expensive retraining.
From the abstract
Object hallucination in Large Vision-Language Models (LVLMs) severely compromises their reliability in real-world applications, posing a critical barrier to their deployment in high-stakes scenarios such as autonomous driving and medical image analysis. Through systematic empirical investigation, we identify that the imbalanced attention allocation, both across modalities (i.e., vision and language) and within modalities (among individual tokens), exhibits a strong causal correlation with the oc