AI & ML Paradigm Shift

Standard decoding strategies (top-k, nucleus) create a 'truncation blind spot' by systematically excluding human-like, low-probability token choices.

March 20, 2026

Original Paper

The Truncation Blind Spot: How Decoding Strategies Systematically Exclude Human-Like Token Choices

Esteban Garces Arias, Nurzhan Sapargali, Christian Heumann, Matthias Aßenmacher

arXiv · 2603.18482

The Takeaway

It demonstrates that the detectability of machine-generated text is a direct byproduct of likelihood-based decoding rather than model capability. This suggests that making LLMs more human-like requires moving beyond current truncation-based sampling methods.

From the abstract

Standard decoding strategies for text generation, including top-k, nucleus sampling, and contrastive search, select tokens based on likelihood, restricting selection to high-probability regions. Human language production operates differently: tokens are chosen for communicative appropriateness rather than statistical frequency. This mismatch creates a truncation blind spot: contextually appropriate but statistically rare tokens remain accessible to humans yet unreachable by likelihood-based deco

Read the original paper →

← Back to today's papers