A specialized distributed serving system for 'Any-to-Any' multimodal models that achieves 5.79x lower tail latency via component disaggregation.
arXiv · March 13, 2026 · 2603.12118
Why it matters
Any-to-Any models have complex, modality-dependent computation graphs that choke standard serving engines; this system provides the necessary infrastructure for deploying the next generation of truly multimodal architectures at scale.
From the abstract
Any-to-Any models are an emerging class of multimodal models that accept combinations of multimodal data (e.g., text, image, video, audio) as input and generate them as output. Serving these models are challenging; different requests with different input and output modalities traverse different paths through the model computation graph, and each component of the model have different scaling characteristics.We present Cornserve, a distributed serving system for generic Any-to-Any models. Cornserv