A routing framework that uses internal prefill activations to select the optimal LLM for a task, capturing 45% of the oracle accuracy gap with 74% cost savings.
March 24, 2026
Original Paper
LLM Router: Prefill is All You Need
arXiv · 2603.20895
The Takeaway
Moves beyond semantic-based routing to 'encoder-target decoupling,' using the internal state of a cheap model to predict the performance of an expensive one. This offers a highly practical method for heterogeneous model serving in production.
From the abstract
LLMs often share comparable benchmark accuracies, but their complementary performance across task subsets suggests that an Oracle router--a theoretical selector with perfect foresight--can significantly surpass standalone model accuracy by navigating model-specific strengths. While current routers rely on fragile semantic signals, we propose using internal prefill activations via Encoder-Target Decoupling--a functional separation between the model providing the predictive signal (the Encoder) an