AI & ML Paradigm Shift

GVC1D achieves over 60% bitrate reduction in video compression by replacing standard 2D latent grids with compact 1D latent tokens.

March 17, 2026

Original Paper

Generative Video Compression with One-Dimensional Latent Representation

Zihan Zheng, Zhaoyang Jia, Naifu Xue, Jiahao Li, Bin Li, Zongyu Guo, Xiaoyi Zhang, Zhenghao Chen, Houqiang Li, Yan Lu

arXiv · 2603.15302

The Takeaway

It challenges the assumption that video latents must maintain a 2D spatial grid. The 1D representation allows the model to adaptively attend to semantic regions and better aggregate long-term temporal correlations, leading to significantly higher compression efficiency than current SOTA codecs.

From the abstract

Recent advancements in generative video codec (GVC) typically encode video into a 2D latent grid and employ high-capacity generative decoders for reconstruction. However, this paradigm still leaves two key challenges in fully exploiting spatial-temporal redundancy: Spatially, the 2D latent grid inevitably preserves intra-frame redundancy due to its rigid structure, where adjacent patches remain highly similar, thereby necessitating a higher bitrate. Temporally, the 2D latent grid is less effecti

Read the original paper →

← Back to today's papers