Achieves significant tool-selection accuracy gains in LLM semantic routers with zero added serving-time latency or cost.
March 17, 2026
Original Paper
Outcome-Aware Tool Selection for Semantic Routers: Latency-Constrained Learning Without LLM Inference
arXiv · 2603.13426
The Takeaway
OATS refines tool embeddings toward historical 'success centroids' offline, improving NDCG@5 from 0.86 to 0.94. It demonstrates that expensive LLM re-ranking can be replaced by zero-cost embedding refinement for production-grade agentic gateways.
From the abstract
Semantic routers in LLM inference gateways select tools in the critical request path, where every millisecond of added latency compounds across millions of requests. We propose Outcome-Aware Tool Selection (OATS), which interpolates tool embeddings toward the centroid of queries where they historically succeed -- an offline process that adds no parameters, latency, or GPU cost at serving time. On MetaTool (199~tools, 4,287~queries), this improves NDCG@5 from 0.869 to 0.940; on ToolBench (2,413~A