Enables vision models to learn online from human corrections at inference time, reducing redundant manual effort in video segmentation by up to 34%.
March 31, 2026
Original Paper
Live Interactive Training for Video Segmentation
arXiv · 2603.26929
The Takeaway
Current interactive tools like SAM2 treat every user correction as a localized patch without updating the model's underlying knowledge for the rest of the video. LIT-LoRA introduces rapid on-the-fly weight updates, allowing tools to actually 'improve' during a session and significantly accelerating data labeling and high-stakes video editing workflows.
From the abstract
Interactive video segmentation often requires many user interventions for robust performance in challenging scenarios (e.g., occlusions, object separations, camouflage, etc.). Yet, even state-of-the-art models like SAM2 use corrections only for immediate fixes without learning from this feedback, leading to inefficient, repetitive user effort. To address this, we introduce Live Interactive Training (LIT), a novel framework for prompt-based visual systems where models also learn online from human