Reduces the compute cost of LLM test-time scaling by up to 67% using conformal prediction to calibrate reasoning paths.
April 2, 2026
Original Paper
Online Reasoning Calibration: Test-Time Training Enables Generalizable Conformal LLM Reasoning
arXiv · 2604.01170
The Takeaway
Allows models to stop reasoning as soon as a valid confidence threshold is reached, drastically improving the efficiency of expensive 'thinking' models while maintaining rigorous theoretical error bounds.
From the abstract
While test-time scaling has enabled large language models to solve highly difficult tasks, state-of-the-art results come at exorbitant compute costs. These inefficiencies can be attributed to the miscalibration of post-trained language models, and the lack of calibration in popular sampling techniques. Here, we present Online Reasoning Calibration (ORCA), a framework for calibrating the sampling process that draws upon conformal prediction and test-time training. Specifically, we introduce a met