AI & ML Efficiency Breakthrough

AdaAnchor enables LLMs to perform multi-step reasoning entirely in latent space with an adaptive halting mechanism to optimize compute.

March 17, 2026

Original Paper

Thinking in Latents: Adaptive Anchor Refinement for Implicit Reasoning in LLMs

Disha Sheshanarayana, Rajat Subhra Pal, Manjira Sinha, Tirthankar Dasgupta

arXiv · 2603.15051

The Takeaway

It addresses the massive overhead of chain-of-thought token generation by shifting reasoning into hidden representations. The adaptive halting mechanism reduces latent refinement steps by up to 60% compared to fixed-step methods, making 'internal thinking' practical for real-time inference.

From the abstract

Token-level Chain-of-Thought (CoT) prompting has become a standard way to elicit multi-step reasoning in large language models (LLMs), especially for mathematical word problems. However, generating long intermediate traces increases output length and inference cost, and can be inefficient when the model could arrive at the correct answer without extensive verbalization. This has motivated latent-space reasoning approaches that shift computation into hidden representations and only emit a final a