AI & ML Efficiency Breakthrough

Reduces Chain-of-Thought (CoT) compute costs by 14-55% by learning the optimal 'early-exit' points for Large Reasoning Models.

arXiv · March 16, 2026 · 2603.12529

Alliot Nagle, Jakhongir Saydaliev, Dhia Garbaya, Michael Gastpar, Ashok Vardhan Makkuva, Hyeji Kim

Why it matters

Reasoning models like O1 often overthink, wasting tokens on reasoning after the answer is already formed. TERMINATOR predicts when the model is done, allowing for significant inference-cost savings while maintaining or even improving accuracy.

From the abstract

Large Reasoning Models (LRMs) achieve impressive performance on complex reasoning tasks via Chain-of-Thought (CoT) reasoning, which enables them to generate intermediate thinking tokens before arriving at the final answer. However, LRMs often suffer from significant overthinking, spending excessive compute time even after the answer is generated early on. Prior work has identified the existence of an optimal reasoning length such that truncating reasoning at this point significantly shortens CoT

Read the original paper →

← Back to today's papers