AI & ML New Capability

Segment Anything Reasoner (StAR) successfully introduces parallel test-time scaling to visual segmentation tasks, eliciting latent reasoning capabilities from base models.

arXiv · March 17, 2026 · 2603.14382

Seokju Yun, Dongheon Lee, Noori Bae, Jaesung Jun, Chanseul Cho, Youngmin Ro

The Takeaway

It shows that 'compute-at-inference' (search/rollout) strategies used in LLMs (like OpenAI's o1) are transferable to computer vision. This allows models to solve complex, multi-step reasoning queries for object localization that standard feed-forward segmentation models fail on.

From the abstract

As AI systems are being integrated more rapidly into diverse and complex real-world environments, the ability to perform holistic reasoning over an implicit query and an image to localize a target is becoming increasingly important. However, recent reasoning segmentation methods fail to sufficiently elicit the visual reasoning capabilities of the base mode. In this work, we present Segment Anything Reasoner (StAR), a comprehensive framework that refines the design space from multiple perspective