Standard decoding strategies (top-k, nucleus) create a 'truncation blind spot' by systematically excluding human-like, low-probability token choices.
March 20, 2026
Original Paper
The Truncation Blind Spot: How Decoding Strategies Systematically Exclude Human-Like Token Choices
arXiv · 2603.18482
The Takeaway
It demonstrates that the detectability of machine-generated text is a direct byproduct of likelihood-based decoding rather than model capability. This suggests that making LLMs more human-like requires moving beyond current truncation-based sampling methods.
From the abstract
Standard decoding strategies for text generation, including top-k, nucleus sampling, and contrastive search, select tokens based on likelihood, restricting selection to high-probability regions. Human language production operates differently: tokens are chosen for communicative appropriateness rather than statistical frequency. This mismatch creates a truncation blind spot: contextually appropriate but statistically rare tokens remain accessible to humans yet unreachable by likelihood-based deco