AI & ML New Capability

Enables training-free infinite video generation (hour-scale) by using evolving memory tokens to solve identity drift and motion stagnation.

arXiv · March 16, 2026 · 2603.12513

Youngrae Kim, Qixin Hu, C.-C. Jay Kuo, Peter A. Beerel

Why it matters

Current autoregressive diffusion models suffer from 'amnesia' as they slide their attention window, leading to character changes or frozen motion. MemRoPE provides a fixed-size cache that compresses global history, allowing for consistent, hour-long video synthesis without retraining.

From the abstract

Autoregressive diffusion enables real-time frame streaming, yet existing sliding-window caches discard past context, causing fidelity degradation, identity drift, and motion stagnation over long horizons. Current approaches preserve a fixed set of early tokens as attention sinks, but this static anchor cannot reflect the evolving content of a growing video. We introduce MemRoPE, a training-free framework with two co-designed components. Memory Tokens continuously compress all past keys into dual