AI & ML Efficiency Breakthrough

A routing framework that uses internal prefill activations to select the optimal LLM for a task, capturing 45% of the oracle accuracy gap with 74% cost savings.

March 24, 2026

Original Paper

LLM Router: Prefill is All You Need

Tanay Varshney, Annie Surla, Michelle Xu, Gomathy Venkata Krishnan, Maximilian Jeblick, David Austin, Neal Vaidya, Davide Onofrio

arXiv · 2603.20895

The Takeaway

Moves beyond semantic-based routing to 'encoder-target decoupling,' using the internal state of a cheap model to predict the performance of an expensive one. This offers a highly practical method for heterogeneous model serving in production.

From the abstract

LLMs often share comparable benchmark accuracies, but their complementary performance across task subsets suggests that an Oracle router--a theoretical selector with perfect foresight--can significantly surpass standalone model accuracy by navigating model-specific strengths. While current routers rely on fragile semantic signals, we propose using internal prefill activations via Encoder-Target Decoupling--a functional separation between the model providing the predictive signal (the Encoder) an

Read the original paper →

← Back to today's papers