AI & ML Efficiency Breakthrough

Achieves hour-scale real-time human animation by solving the unbounded memory growth and inconsistent noise states in autoregressive diffusion.

arXiv · March 13, 2026 · 2603.11746

Dingcheng Zhen, Xu Zheng, Ruixin Zhang, Zhiqi Jiang, Yichao Yan, Ming Tao, Shunshun Yin

Why it matters

The introduction of Neighbor Forcing and structured ConvKV memory allows for infinite video generation with constant memory usage. This effectively shatters the hardware-imposed temporal limits that currently constrain video generation practitioners to short clips.

From the abstract

Autoregressive (AR) diffusion models offer a promising framework for sequential generation tasks such as video synthesis by combining diffusion modeling with causal inference. Although they support streaming generation, existing AR diffusion methods struggle to scale efficiently. In this paper, we identify two key challenges in hour-scale real-time human animation. First, most forcing strategies propagate sample-level representations with mismatched diffusion states, causing inconsistent learnin