AI & ML Efficiency Breakthrough

The ICaRus architecture allows multiple different models to share a single, frozen KV cache for the same prompt.

March 17, 2026

Original Paper

ICaRus: Identical Cache Reuse for Efficient Multi Model Inference

Sunghyeon Woo, Jaeeun Kil, Hoseung Kim, Minsub Kim, Joonghoon Kim, Ahreum Seo, Sungjae Lee, Minjung Jo, Jiwon Ryu, Baeseong Park, Se Jung Kwon, Dongsoo Lee

arXiv · 2603.13281

The Takeaway

In agentic systems involving multiple model calls, KV cache redundancy is a massive memory bottleneck. By decoupling Transformers into shared encoders and specialized decoders, ICaRus eliminates recomputation and cache explosion, making multi-model workflows significantly more scalable.

From the abstract

Multi model inference has recently emerged as a prominent paradigm, particularly in the development of agentic AI systems. However, in such scenarios, each model must maintain its own Key-Value (KV) cache for the identical prompt, leading to substantial memory consumption. This explosive growth of KV caches forces LLM serving systems to evict previously stored caches, which in turn introduces significant recomputation overhead whenever the evicted caches are required again. Moreover, prefix cach

Read the original paper →

← Back to today's papers