AI & ML New Capability

Introduces the first discrete generation model capable of handling high-dimensional (768-1024 dims) representation tokens.

arXiv · March 20, 2026 · 2603.19232

Yuqing Wang, Chuofan Ma, Zhijie Lin, Yao Teng, Lijun Yu, Shuai Wang, Jiaming Han, Jiashi Feng, Yi Jiang, Xihui Liu

The Takeaway

Current discrete models are limited to low-dimensional latents (8-32 dims), losing semantic richness. By enabling generation on high-dimensional tokens, this allows models to directly predict and generate using the same rich features used for understanding (e.g., CLIP/LLM embeddings) without a lossy bottleneck.

From the abstract

Visual generation with discrete tokens has gained significant attention as it enables a unified token prediction paradigm shared with language models, promising seamless multimodal architectures. However, current discrete generation methods remain limited to low-dimensional latent tokens (typically 8-32 dims), sacrificing the semantic richness essential for understanding. While high-dimensional pretrained representations (768-1024 dims) could bridge this gap, their discrete generation poses fund