VGS-Decoding is a training-free method to mitigate medical VLM hallucinations by reweighting token probabilities based on their visual dependency.
March 24, 2026
Original Paper
VGS-Decoding: Visual Grounding Score Guided Decoding for Hallucination Mitigation in Medical VLMs
arXiv · 2603.20314
The Takeaway
Hallucination in clinical settings is often driven by language priors over-riding visual evidence. This method uses a 'Visual Grounding Score' to amplify grounded tokens and suppress hallucinations during inference, offering nearly 9% gains in recall without any additional training cost.
From the abstract
Medical Vision-Language Models (VLMs) often hallucinate by generating responses based on language priors rather than visual evidence, posing risks in clinical applications. We propose Visual Grounding Score Guided Decoding (VGS-Decoding), a training-free method to mitigate hallucinations during inference. Our key insight is that hallucinated tokens maintain or increase their probability when visual information is degraded, while visually grounded tokens decrease in probability. We introduce the