A 3D vision-language pipeline that grounds medical diagnosis in longitudinal brain MRI via regional volumetric assessments to eliminate VLM hallucinations.
arXiv · March 13, 2026 · 2603.12071
Why it matters
It moves medical AI from simple classification to 'grounded reasoning' by forcing consistency between 3D visual measurements and textual summaries, significantly outperforming existing medical VLM baselines in diagnostic accuracy.
From the abstract
Longitudinal brain MRI is essential for characterizing the progression of neurological diseases such as Alzheimer's disease assessment. However, current deep-learning tools fragment this process: classifiers reduce a scan to a label, volumetric pipelines produce uninterpreted measurements, and vision-language models (VLMs) may generate fluent but potentially hallucinated conclusions. We present LoV3D, a pipeline for training 3D vision-language models, which reads longitudinal T1-weighted brain M