Introduces the Compression-Consistency Principle, arguing that LLMs prefer truth only when false alternatives are structurally harder to compress.
arXiv · March 13, 2026 · 2603.11749
Why it matters
This shifts the understanding of 'truth' in LLMs from a semantic property to a structural byproduct of compression efficiency. It explains why truth bias emerges from next-token prediction and provides a roadmap for improving model accuracy through architectural constraints rather than just data volume.
From the abstract
Why do language models sometimes prefer correct statements even when trained on mixed-quality data? We introduce the Compression--Consistency Principle: next-token prediction favors hypotheses that allow shorter and more internally consistent descriptions of the training data. Truth bias emerges only when false alternatives are structurally harder to compress. We test this using small GPT-2-style character-level transformers (3.5M--86M parameters) on synthetic math corpora with controlled mixtur