AI & ML Scaling Insight

Restores monotonic scaling in LLM tree search by replacing standard MCTS selection with Gumbel sampling and Sequential Halving.

March 24, 2026

Original Paper

Revisiting Tree Search for LLMs: Gumbel and Sequential Halving for Budget-Scalable Reasoning

Leonid Ugadiarov, Yuri Kuratov, Aleksandr Panov, Alexey Skrynnik

arXiv · 2603.21162

The Takeaway

Previously, increasing the search budget for LLM reasoning often led to performance drops. This 'ReSCALE' approach ensures that accuracy actually improves as more compute is allocated at inference, a vital finding for the future of 'o1-style' reasoning models.

From the abstract

Neural tree search is a powerful decision-making algorithm widely used in complex domains such as game playing and model-based reinforcement learning. Recent work has applied AlphaZero-style tree search to enhance the reasoning capabilities of Large Language Models (LLMs) during inference, but we find that this approach suffers from a scaling failure: on GSM8K and Game24, accuracy drops as the search budget increases. In this paper, we present ReSCALE, an adaptation of Gumbel AlphaZero MCTS that