AI & ML Efficiency Breakthrough

PivotRL identifies 'pivot' turns in agent trajectories where actions matter most, enabling compute-efficient reinforcement learning that matches end-to-end RL at 4x lower cost.

March 24, 2026

Original Paper

PivotRL: High Accuracy Agentic Post-Training at Low Compute Cost

Junkeun Yi, Damon Mosk-Aoyama, Baihe Huang, Ritu Gala, Charles Wang, Sugam Dipak Devare, Khushi Bhardwaj, Abhibha Gupta, Oleksii Kuchaiev, Jiantao Jiao, Jian Zhang, Venkat Srinivasan

arXiv · 2603.21383

The Takeaway

By focusing rollouts on high-variance decision points and rewarding functional equivalence, it solves the efficiency gap in agentic post-training. It is already being used in production-scale models like NVIDIA's Nemotron-3-Super.

From the abstract

Post-training for long-horizon agentic tasks has a tension between compute efficiency and generalization. While supervised fine-tuning (SFT) is compute efficient, it often suffers from out-of-domain (OOD) degradation. Conversely, end-to-end reinforcement learning (E2E RL) preserves OOD capabilities, but incurs high compute costs due to many turns of on-policy rollout. We introduce PivotRL, a novel framework that operates on existing SFT trajectories to combine the compute efficiency of SFT with