ReBalance is a training-free framework that dynamically modulates 'thinking' length in reasoning models to prune redundancy during overthinking and promote exploration during underthinking.
arXiv · March 16, 2026 · 2603.12372
Why it matters
As 'o1-style' reasoning models become standard, computational efficiency at test-time is critical. This method uses confidence variance and steering vectors to stop models from wasting tokens on easy problems, improving both speed and accuracy.
From the abstract
Large Reasoning Models (LRMs) have shown remarkable reasoning capabilities, yet they often suffer from overthinking, expending redundant computational steps on simple problems, or underthinking, failing to explore sufficient reasoning paths despite inherent capabilities. These issues lead to inefficiencies and potential inaccuracies, limiting practical deployment in resource-constrained settings. Existing methods to mitigate overthinking, such as suppressing reflective keywords or adjusting reas