AI & ML New Capability

TARo introduces a learnable token-level router that steers frozen LLMs toward structured reasoning at test-time without retraining.

arXiv · March 20, 2026 · 2603.18411

Arushi Rai, Qiang Zhang, Hanqing Zeng, Yunkai Zhang, Dipesh Tamboli, Xiangjun Fan, Zhuokai Zhao

The Takeaway

It extends test-time alignment from simple preference optimization to complex math and clinical reasoning. The ability to generalize from small to large backbones without retraining makes it a powerful, lightweight tool for enhancing reasoning in frozen production models.

From the abstract

Large language models (LLMs) exhibit strong reasoning capabilities but typically require expensive post-training to reach high performance. Recent test-time alignment methods offer a lightweight alternative, but have been explored mainly for preference alignment rather than reasoning. To bridge this gap, we propose, Token-level Adaptive Routing (TARo), which steers frozen LLMs toward structured reasoning entirely at inference time. Specifically, we first train reward models on step-wise mathemat

Read the original paper →

← Back to today's papers