Introduces a training-free framework that allows LLM agents to dynamically scale their reasoning depth based on a pre-defined token/tool budget.
arXiv · March 16, 2026 · 2603.12634
Why it matters
Most agents either over-consume resources or fail on complex tasks; this budget-conditioned node selection allows agents to pivot from exploration to exploitation as resources deplete, improving reliability without expensive fine-tuning.
From the abstract
Test-time scaling has become a dominant paradigm for improving LLM agent reliability, yet current approaches treat compute as an abundant resource, allowing agents to exhaust token and tool budgets on redundant steps or dead-end trajectories. Existing budget-aware methods either require expensive fine-tuning or rely on coarse, trajectory-level heuristics that cannot intervene mid-execution. We propose the Budget-Aware Value Tree (BAVT), a training-free inference-time framework that models multi-