AI & ML Efficiency Breakthrough

Introduces Heddle, a trajectory-centric system that resolves the long-tail latency bottleneck of tool calls in agentic Reinforcement Learning.

March 31, 2026

Original Paper

Heddle: A Distributed Orchestration System for Agentic RL Rollout

Zili Zhang, Yinmin Zhong, Chengxu Yang, Chao Jin, Bingyang Wu, Xinming Wei, Yuliang Liu, Xin Jin

arXiv · 2603.28101

The Takeaway

Agentic RL is often throttled by unpredictable external tool calls; Heddle uses trajectory-level scheduling and dynamic model parallelism to achieve 2.5x higher rollout throughput. This is a critical infrastructure advancement for practitioners scaling large-scale LLM agent training.

From the abstract

Agentic Reinforcement Learning (RL) enables LLMs to solve complex tasks by alternating between a data-collection rollout phase and a policy training phase. During rollout, the agent generates trajectories, i.e., multi-step interactions between LLMs and external tools. Yet, frequent tool calls induce long-tailed trajectory generation that bottlenecks rollouts. This stems from step-centric designs that ignore trajectory context, triggering three system problems for long-tail trajectory generation: