AI & ML Efficiency Breakthrough

Achieves significant tool-selection accuracy gains in LLM semantic routers with zero added serving-time latency or cost.

March 17, 2026

Original Paper

Outcome-Aware Tool Selection for Semantic Routers: Latency-Constrained Learning Without LLM Inference

Huamin Chen, Xunzhuo Liu, Junchen Jiang, Bowei He, Xue Liu

arXiv · 2603.13426

The Takeaway

OATS refines tool embeddings toward historical 'success centroids' offline, improving NDCG@5 from 0.86 to 0.94. It demonstrates that expensive LLM re-ranking can be replaced by zero-cost embedding refinement for production-grade agentic gateways.

From the abstract

Semantic routers in LLM inference gateways select tools in the critical request path, where every millisecond of added latency compounds across millions of requests. We propose Outcome-Aware Tool Selection (OATS), which interpolates tool embeddings toward the centroid of queries where they historically succeed -- an offline process that adds no parameters, latency, or GPU cost at serving time. On MetaTool (199~tools, 4,287~queries), this improves NDCG@5 from 0.869 to 0.940; on ToolBench (2,413~A