AI & ML New Capability

FrescoDiffusion enables coherent, 4K image-to-video generation using a training-free, tiled diffusion method with precomputed latent priors.

arXiv · March 19, 2026 · 2603.17555

Hugo Caselles-Dupré, Mathis Koroglu, Guillaume Jeanneret, Arnaud Dapogny, Matthieu Cord

The Takeaway

It solves the resolution ceiling of current I2V models by fusing local tile details with global latent trajectories. This allows practitioners to animate high-resolution, complex scenes like monumental artworks while maintaining global structural consistency without expensive retraining.

From the abstract

Diffusion-based image-to-video (I2V) models are increasingly effective, yet they struggle to scale to ultra-high-resolution inputs (e.g., 4K). Generating videos at the model's native resolution often loses fine-grained structure, whereas high-resolution tiled denoising preserves local detail but breaks global layout consistency. This failure mode is particularly severe in the fresco animation setting: monumental artworks containing many distinct characters, objects, and semantically different su