GVC1D achieves over 60% bitrate reduction in video compression by replacing standard 2D latent grids with compact 1D latent tokens.
March 17, 2026
Original Paper
Generative Video Compression with One-Dimensional Latent Representation
arXiv · 2603.15302
The Takeaway
It challenges the assumption that video latents must maintain a 2D spatial grid. The 1D representation allows the model to adaptively attend to semantic regions and better aggregate long-term temporal correlations, leading to significantly higher compression efficiency than current SOTA codecs.
From the abstract
Recent advancements in generative video codec (GVC) typically encode video into a 2D latent grid and employ high-capacity generative decoders for reconstruction. However, this paradigm still leaves two key challenges in fully exploiting spatial-temporal redundancy: Spatially, the 2D latent grid inevitably preserves intra-frame redundancy due to its rigid structure, where adjacent patches remain highly similar, thereby necessitating a higher bitrate. Temporally, the 2D latent grid is less effecti