Discovers how uncertainty estimation signals like self-consistency and verbalized confidence scale and complement each other in reasoning models.
March 20, 2026
Original Paper
How Uncertainty Estimation Scales with Sampling in Reasoning Models
arXiv · 2603.19118
The Takeaway
Provides a roadmap for deploying reliable reasoning models (like R1 variants) by showing that a hybrid estimator using just two samples can outperform single-signal estimators even at much higher sampling budgets. It characterizes the domain-dependent nature of these signals, specifically highlighting superior scaling in RLVR-trained domains like mathematics.
From the abstract
Uncertainty estimation is critical for deploying reasoning language models, yet remains poorly understood under extended chain-of-thought reasoning. We study parallel sampling as a fully black-box approach using verbalized confidence and self-consistency. Across three reasoning models and 17 tasks spanning mathematics, STEM, and humanities, we characterize how these signals scale.Both self-consistency and verbalized confidence scale in reasoning models, but self-consistency exhibits lower initia