AI & ML New Capability

Integrates radiologist gaze data as a probabilistic prior to align vision-language models with actual human clinical reasoning workflows.

March 30, 2026

Original Paper

Seeing Like Radiologists: Context- and Gaze-Guided Vision-Language Pretraining for Chest X-rays

Kang Liu, Zhuoqi Ma, Siyu Liang, Yunan Li, Xiyue Gao, Chao Liang, Kun Xie, Qiguang Miao

arXiv · 2603.26049

The Takeaway

Standard medical VLM pretraining treats images as context-agnostic. By supervising attention maps using human gaze, CoGaze forces the model to focus on 'diagnostically salient' regions, leading to a massive +23% boost in zero-shot classification and much more reliable report generation.

From the abstract

Despite recent advances in medical vision-language pretraining, existing models still struggle to capture the diagnostic workflow: radiographs are typically treated as context-agnostic images, while radiologists' gaze -- a crucial cue for visual reasoning -- remains largely underexplored by existing methods. These limitations hinder the modeling of disease-specific patterns and weaken cross-modal alignment. To bridge this gap, we introduce CoGaze, a Context- and Gaze-guided vision-language pretr