When an AI calls a blue banana 'yellow,' it’s not because it's blind—it's because it trusts its 'gut' feeling more than the actual photo in front of it.
April 13, 2026
Original Paper
Arbitration Failure, Not Perceptual Blindness: How Vision-Language Models Resolve Visual-Linguistic Conflicts
arXiv · 2604.09364
The Takeaway
This identifies a specific 'arbitration' failure in vision-language models where the model's internal prior knowledge overrides visual evidence. It means that to fix hallucinations, we don't need better cameras, but better logic for resolving internal conflicts.
From the abstract
When a Vision-Language Model (VLM) sees a blue banana and answers "yellow", is the problem of perception or arbitration? We explore the question in ten VLMs with various sizes and reveal an Encoding--Grounding Dissociation: models that fail to report what they see (and thus provide a wrong answer) still encode the visual evidence as strongly as models that provide the correct answer. Using Multimodal Arbitration Crossover (MAC) analysis with layer-by-layer Logit Lens probing, we track the compet