Enables precise prompt routing by predicting the expected reward of a model before any response is generated.
March 24, 2026
Original Paper
Expected Reward Prediction, with Applications to Model Routing
arXiv · 2603.20217
The Takeaway
Practitioners can now systematically route queries to the cheapest model likely to succeed based on predicted reward, rather than relying on category-level heuristics or expensive multi-model sampling.
From the abstract
Reward models are a standard tool to score responses from LLMs. Reward models are built to rank responses to a fixed prompt sampled from a single model, for example to choose the best of n sampled responses. In this paper, we study whether scores from response-level reward models lifted to score a model's suitability for a prompt, prior to seeing responses from that model. Specifically, we show that it is straightforward to predict the expected reward that an LLM would earn from the reward model