A training-free metacognitive framework that gives LLMs explicit control over expanding, pruning, and repairing reasoning trajectories during inference.
March 31, 2026
Original Paper
CoT2-Meta: Budgeted Metacognitive Control for Test-Time Reasoning
arXiv · 2603.28135
The Takeaway
It moves beyond simple Best-of-N sampling by using a meta-controller to manage search budgets dynamically. Achieving 92.8 on MATH and 48.8 on HLE without retraining suggests a significant step forward in effectively scaling test-time compute.
From the abstract
Recent test-time reasoning methods improve performance by generating more candidate chains or searching over larger reasoning trees, but they typically lack explicit control over when to expand, what to prune, how to repair, and when to abstain. We introduce CoT2-Meta, a training-free metacognitive reasoning framework that combines object-level chain-of-thought generation with meta-level control over partial reasoning trajectories. The framework integrates four components: strategy-conditioned t