LLMs compute and cache confidence scores automatically during answer generation, well before they are prompted to verbalize them.
March 19, 2026
Original Paper
How do LLMs Compute Verbal Confidence
arXiv · 2603.17839
The Takeaway
It reveals that verbal confidence is not a post-hoc reconstruction or a simple log-probability readout, but a sophisticated internal self-evaluation stored in specific hidden states. This provides a mechanistic target for improving model calibration and detecting hallucinations through internal probes.
From the abstract
Verbal confidence -- prompting LLMs to state their confidence as a number or category -- is widely used to extract uncertainty estimates from black-box models. However, how LLMs internally generate such scores remains unknown. We address two questions: first, when confidence is computed - just-in-time when requested, or automatically during answer generation and cached for later retrieval; and second, what verbal confidence represents - token log-probabilities, or a richer evaluation of answer q