Surg-R1 is a specialized surgical reasoning model released alongside the largest surgical Chain-of-Thought dataset (320,000 pairs).
arXiv · March 16, 2026 · 2603.12430
Why it matters
It democratizes high-quality reasoning in a specialized domain where general-purpose models like GPT-4 often fail. The hierarchical reasoning framework and multi-center validation provide a blueprint for building domain-specific 'R1' models.
From the abstract
Surgical scene understanding demands not only accurate predictions but also interpretable reasoning that surgeons can verify against clinical expertise. However, existing surgical vision-language models generate predictions without reasoning chains, and general-purpose reasoning models fail on compositional surgical tasks without domain-specific knowledge. We present Surg-R1, a surgical Vision-Language Model that addresses this gap through hierarchical reasoning trained via a four-stage pipeline