AI & ML New Capability

Breaks the resolution and aspect ratio barriers of image diffusion models, enabling the generation of consistent 32K resolution images.

March 26, 2026

Original Paper

ScrollScape: Unlocking 32K Image Generation With Video Diffusion Priors

Haodong Yu, Yabo Zhang, Donglin Di, Ruyi Zhang, Wangmeng Zuo

arXiv · 2603.24270

The Takeaway

By repurposing video diffusion priors to manage spatial expansion, this framework allows for the synthesis of ultra-high-resolution panoramas without the repetition artifacts common in standard diffusion models. It represents a significant leap in high-fidelity asset generation for virtual environments.

From the abstract

While diffusion models excel at generating images with conventional dimensions, pushing them to synthesize ultra-high-resolution imagery at extreme aspect ratios (EAR) often triggers catastrophic structural failures, such as object repetition and spatialthis http URLlimitation fundamentally stems from a lack of robust spatial priors, as static text-to-image models are primarily trained on image distributions with conventionalthis http URLovercome this bottleneck, we present ScrollScape, a novel