AI & ML Paradigm Shift

Proposes the Spectrum Matching Hypothesis to explain why some VAE latents are 'undiffusable' and introduces techniques to align power spectral densities for superior image generation.

March 17, 2026

Original Paper

Spectrum Matching: a Unified Perspective for Superior Diffusability in Latent Diffusion

Mang Ning, Mingxiao Li, Le Zhang, Lanmiao Liu, Matthew B. Blaschko, Albert Ali Salah, Itir Onal Ertugrul

arXiv · 2603.14645

The Takeaway

It links the geometry of latent spaces to signal processing, providing a unified theoretical framework for latent diffusion. The proposed ESM and DSM methods consistently improve generation quality on standard benchmarks like CelebA and ImageNet.

From the abstract

In this paper, we study the diffusability (learnability) of variational autoencoders (VAE) in latent diffusion. First, we show that pixel-space diffusion trained with an MSE objective is inherently biased toward learning low and mid spatial frequencies, and that the power-law power spectral density (PSD) of natural images makes this bias perceptually beneficial. Motivated by this result, we propose the \emph{Spectrum Matching Hypothesis}: latents with superior diffusability should (i) follow a f