AI & ML New Capability

High-quality oversight of massive proprietary LLM agents can be achieved by small, open-source 'critics' that intervene in real-time within the same interaction.

April 2, 2026

Original Paper

Asymmetric Actor-Critic for Multi-turn LLM Agents

Shuli Jiang, Zhaoyang Zhang, Yi Zhang, Shuo Yang, Wei Xia, Stefano Soatto

arXiv · 2604.00304

The Takeaway

This asymmetric actor-critic approach allows for reliable deployment of black-box models in multi-turn tasks without the overhead of multi-step reflection. It demonstrates that runtime supervision is a viable, compute-efficient alternative to fine-tuning or zero-shot agentic loops.

From the abstract

Large language models (LLMs) exhibit strong reasoning and conversational abilities, but ensuring reliable behavior in multi-turn interactions remains challenging. In many real-world applications, agents must succeed in one-shot settings where retries are impossible. Existing approaches either rely on reflection or post-hoc evaluation, which require additional attempts, or assume fully trainable models that cannot leverage proprietary LLMs. We propose an asymmetric actor-critic framework for reli