AI & ML New Capability

LLMs can be fine-tuned to act as their own 'Z-token' compressors, achieving 18x text reduction without losing reconstruction fidelity.

March 27, 2026

Original Paper

Large Language Model as Token Compressor and Decompressor

Wenbing Li, Zikai Song, Jielei Zhang, Tianhao Zhao, Junkai Lin, Yiran Wang, Wei Yang

arXiv · 2603.25340

The Takeaway

By translating text into a compact internal latent language, this method bypasses traditional token limits. It enables extremely long-context reasoning and more efficient prompt caching by treating the LLM itself as a semantic compressor.

From the abstract

In this paper, we establish the novel insight that an off-the-shelf LLM can function as an excellent token compressor and decompressor. To demonstrate, we design a self-expressive autoencoding learning framework fine-tunes a pretrained LLM to translate long texts into a compact internal language of discrete, variable-length latent codes, termed Z-tokens, and to reconstruct the original text exactly from them. The resulting representation is content-adaptive: semantically dense segments receive m