Proposes the Spectrum Matching Hypothesis to explain why some VAE latents are 'undiffusable' and introduces techniques to align power spectral densities for superior image generation.
March 17, 2026
Original Paper
Spectrum Matching: a Unified Perspective for Superior Diffusability in Latent Diffusion
arXiv · 2603.14645
The Takeaway
It links the geometry of latent spaces to signal processing, providing a unified theoretical framework for latent diffusion. The proposed ESM and DSM methods consistently improve generation quality on standard benchmarks like CelebA and ImageNet.
From the abstract
In this paper, we study the diffusability (learnability) of variational autoencoders (VAE) in latent diffusion. First, we show that pixel-space diffusion trained with an MSE objective is inherently biased toward learning low and mid spatial frequencies, and that the power-law power spectral density (PSD) of natural images makes this bias perceptually beneficial. Motivated by this result, we propose the \emph{Spectrum Matching Hypothesis}: latents with superior diffusability should (i) follow a f