Breaks the resolution and aspect ratio barriers of image diffusion models, enabling the generation of consistent 32K resolution images.
March 26, 2026
Original Paper
ScrollScape: Unlocking 32K Image Generation With Video Diffusion Priors
arXiv · 2603.24270
The Takeaway
By repurposing video diffusion priors to manage spatial expansion, this framework allows for the synthesis of ultra-high-resolution panoramas without the repetition artifacts common in standard diffusion models. It represents a significant leap in high-fidelity asset generation for virtual environments.
From the abstract
While diffusion models excel at generating images with conventional dimensions, pushing them to synthesize ultra-high-resolution imagery at extreme aspect ratios (EAR) often triggers catastrophic structural failures, such as object repetition and spatialthis http URLlimitation fundamentally stems from a lack of robust spatial priors, as static text-to-image models are primarily trained on image distributions with conventionalthis http URLovercome this bottleneck, we present ScrollScape, a novel