Reduces Chain-of-Thought (CoT) compute costs by 14-55% by learning the optimal 'early-exit' points for Large Reasoning Models.
arXiv · March 16, 2026 · 2603.12529
Why it matters
Reasoning models like O1 often overthink, wasting tokens on reasoning after the answer is already formed. TERMINATOR predicts when the model is done, allowing for significant inference-cost savings while maintaining or even improving accuracy.
From the abstract
Large Reasoning Models (LRMs) achieve impressive performance on complex reasoning tasks via Chain-of-Thought (CoT) reasoning, which enables them to generate intermediate thinking tokens before arriving at the final answer. However, LRMs often suffer from significant overthinking, spending excessive compute time even after the answer is generated early on. Prior work has identified the existence of an optimal reasoning length such that truncating reasoning at this point significantly shortens CoT