RelayCaching eliminates redundant prefill computation in multi-agent systems by reusing the decoding-phase KV cache from previous agents.
March 17, 2026
Original Paper
RelayCaching: Accelerating LLM Collaboration via Decoding KV Cache Reuse
arXiv · 2603.13289
The Takeaway
It achieves up to a 4.7x reduction in Time-to-First-Token (TTFT) for collaborative LLM tasks without requiring training. This is a critical win for multi-agent workflows where agents build upon each other's generated outputs.
From the abstract
The increasing complexity of AI tasks has shifted the paradigm from monolithic models toward multi-agent large language model (LLM) systems. However, these collaborative architectures introduce a critical bottleneck: redundant prefill computation for shared content generated by previous agents, which significantly increases KV cache memory usage and time-to-first-token (TTFT). While various KV cache methods have been proposed to mitigate prefill redundancy, they either fail to maintain accuracy