LLMs can be fine-tuned to act as their own 'Z-token' compressors, achieving 18x text reduction without losing reconstruction fidelity.
March 27, 2026
Original Paper
Large Language Model as Token Compressor and Decompressor
arXiv · 2603.25340
The Takeaway
By translating text into a compact internal latent language, this method bypasses traditional token limits. It enables extremely long-context reasoning and more efficient prompt caching by treating the LLM itself as a semantic compressor.
From the abstract
In this paper, we establish the novel insight that an off-the-shelf LLM can function as an excellent token compressor and decompressor. To demonstrate, we design a self-expressive autoencoding learning framework fine-tunes a pretrained LLM to translate long texts into a compact internal language of discrete, variable-length latent codes, termed Z-tokens, and to reconstruct the original text exactly from them. The resulting representation is content-adaptive: semantically dense segments receive m