AI & ML Efficiency Breakthrough

ReBalance is a training-free framework that dynamically modulates 'thinking' length in reasoning models to prune redundancy during overthinking and promote exploration during underthinking.

arXiv · March 16, 2026 · 2603.12372

Yulin Li, Tengyao Tu, Li Ding, Junjie Wang, Huiling Zhen, Yixin Chen, Yong Li, Zhuotao Tian

Why it matters

As 'o1-style' reasoning models become standard, computational efficiency at test-time is critical. This method uses confidence variance and steering vectors to stop models from wasting tokens on easy problems, improving both speed and accuracy.

From the abstract

Large Reasoning Models (LRMs) have shown remarkable reasoning capabilities, yet they often suffer from overthinking, expending redundant computational steps on simple problems, or underthinking, failing to explore sufficient reasoning paths despite inherent capabilities. These issues lead to inefficiencies and potential inaccuracies, limiting practical deployment in resource-constrained settings. Existing methods to mitigate overthinking, such as suppressing reflective keywords or adjusting reas