AI & ML Paradigm Shift

The 'Reasoning Contamination Effect' shows that Chain-of-Thought (CoT) reasoning actually disrupts a model's internal confidence signal, leading to poorer calibration.

March 27, 2026

Original Paper

Closing the Confidence-Faithfulness Gap in Large Language Models

Miranda Muqing Miao, Lyle Ungar

arXiv · 2603.25052

The Takeaway

Researchers found that internal accuracy and verbalized confidence signals are orthogonal in the model's geometry, and reasoning further misaligns them. This provides a mechanistic explanation for why smarter models often sound more overconfident and offers a steering-based fix.

From the abstract

Large language models (LLMs) tend to verbalize confidence scores that are largely detached from their actual accuracy, yet the geometric relationship governing this behavior remain poorly understood. In this work, we present a mechanistic interpretability analysis of verbalized confidence, using linear probes and contrastive activation addition (CAA) steering to show that calibration and verbalized confidence signals are encoded linearly but are orthogonal to one another -- a finding consistent

Read the original paper →

← Back to today's papers