AI & ML Breaks Assumption

Prompt compression can paradoxically increase total energy consumption and cost by over 2000% due to aggressive model 'output expansion'.

March 26, 2026

Original Paper

The Compression Paradox in LLM Inference: Provider-Dependent Energy Effects of Prompt Compression

Warren Johnson

arXiv · 2603.23528

The Takeaway

It challenges the prevailing industry practice of compressing prompts to save money, proving that reducing input tokens often triggers high-entropy, long-winded model responses. Practitioners must now optimize for total trajectory cost rather than just input token count.

From the abstract

The rapid proliferation of Large Language Models has created an environmental paradox: the very technology that could help solve climate challenges is itself becoming a significant contributor to global carbon emissions. We test whether prompt compression improves inference energy efficiency in 28,421 successful API trials (28,428 planned) across three providers (OpenAI GPT-4o-mini, Anthropic Claude-3.5-Sonnet, and DeepSeek-Chat), five benchmarks (HumanEval, MBPP, GSM8K, MATH, MMLU), and four co

Read the original paper →

← Back to today's papers