AI & ML New Capability

A production-ready adaptive router for LLM portfolios that manages cost-quality trade-offs in real-time under strict dollar budgets.

April 2, 2026

Original Paper

ParetoBandit: Budget-Paced Adaptive Routing for Non-Stationary LLM Serving

Annette Taberner-Miller

arXiv · 2604.00136

The Takeaway

This is the first open-source router to handle non-stationary conditions like pricing shifts or silent model regressions while enforcing a hard cost ceiling. It makes multi-model deployment (e.g., GPT-4o mixed with Haiku) viable for budget-constrained production apps.

From the abstract

Production LLM serving often relies on multi-model portfolios spanning a ~530x cost range, where routing decisions trade off quality against cost. This trade-off is non-stationary: providers revise pricing, model quality can regress silently, and new models must be integrated without downtime. We present ParetoBandit, an open-source adaptive router built on cost-aware contextual bandits that is the first to simultaneously enforce dollar-denominated budgets, adapt online to such shifts, and onboa