AI & ML New Capability

Enables vision models to learn online from human corrections at inference time, reducing redundant manual effort in video segmentation by up to 34%.

March 31, 2026

Original Paper

Live Interactive Training for Video Segmentation

Xinyu Yang, Haozheng Yu, Yihong Sun, Bharath Hariharan, Jennifer J. Sun

arXiv · 2603.26929

The Takeaway

Current interactive tools like SAM2 treat every user correction as a localized patch without updating the model's underlying knowledge for the rest of the video. LIT-LoRA introduces rapid on-the-fly weight updates, allowing tools to actually 'improve' during a session and significantly accelerating data labeling and high-stakes video editing workflows.

From the abstract

Interactive video segmentation often requires many user interventions for robust performance in challenging scenarios (e.g., occlusions, object separations, camouflage, etc.). Yet, even state-of-the-art models like SAM2 use corrections only for immediate fixes without learning from this feedback, leading to inefficient, repetitive user effort. To address this, we introduce Live Interactive Training (LIT), a novel framework for prompt-based visual systems where models also learn online from human