AI & ML Paradigm Shift

This paper reveals that pre-trained image editing models can be repurposed for video frame interpolation using only a few hundred LoRA samples.

March 17, 2026

Original Paper

Edit2Interp: Adapting Image Foundation Models from Spatial Editing to Video Frame Interpolation with Few-Shot Learning

Nasrin Rahimi, Mısra Yavuz, Burak Can Biner, Yunus Bilge Kurt, Ahmet Rasim Emirdağı, Süleyman Aslan, Görkay Aydemir, M. Akın Yılmaz, A. Murat Tekalp

arXiv · 2603.15003

The Takeaway

It breaks the assumption that video tasks require dedicated temporal architectures or motion estimation modules. It proves that massive spatial priors in foundation image models already contain 'latent temporal reasoning' that can be activated with minimal data for video synthesis.

From the abstract

Pre-trained image editing models exhibit strong spatial reasoning and object-aware transformation capabilities acquired from billions of image-text pairs, yet they possess no explicit temporal modeling. This paper demonstrates that these spatial priors can be repurposed to unlock temporal synthesis capabilities through minimal adaptation - without introducing any video-specific architecture or motion estimation modules. We show that a large image editing model (Qwen-Image-Edit), originally desig