A training-free feature caching framework that achieves 2.3x speedup for video world models while maintaining 99.4% quality.
March 24, 2026
Original Paper
WorldCache: Content-Aware Caching for Accelerated Video World Models
arXiv · 2603.22286
The Takeaway
Existing diffusion caches cause ghosting in dynamic scenes; WorldCache uses motion-adaptive thresholds and warping to reuse features intelligently. This allows for near-instant generation in high-fidelity video models like Cosmos without retraining.
From the abstract
Diffusion Transformers (DiTs) power high-fidelity video world models but remain computationally expensive due to sequential denoising and costly spatio-temporal attention. Training-free feature caching accelerates inference by reusing intermediate activations across denoising steps; however, existing methods largely rely on a Zero-Order Hold assumption i.e., reusing cached features as static snapshots when global drift is small. This often leads to ghosting artifacts, blur, and motion inconsiste