AI & ML Efficiency Breakthrough

RelayCaching eliminates redundant prefill computation in multi-agent systems by reusing the decoding-phase KV cache from previous agents.

March 17, 2026

Original Paper

RelayCaching: Accelerating LLM Collaboration via Decoding KV Cache Reuse

Yingsheng Geng, Yuchong Gao, Weihong Wu, Guyue Liu, Jiang Liu

arXiv · 2603.13289

The Takeaway

It achieves up to a 4.7x reduction in Time-to-First-Token (TTFT) for collaborative LLM tasks without requiring training. This is a critical win for multi-agent workflows where agents build upon each other's generated outputs.

From the abstract

The increasing complexity of AI tasks has shifted the paradigm from monolithic models toward multi-agent large language model (LLM) systems. However, these collaborative architectures introduce a critical bottleneck: redundant prefill computation for shared content generated by previous agents, which significantly increases KV cache memory usage and time-to-first-token (TTFT). While various KV cache methods have been proposed to mitigate prefill redundancy, they either fail to maintain accuracy

Read the original paper →

← Back to today's papers