Determines the optimal compute distribution for retrieval agents, showing that re-ranking depth is far more critical than query expansion strength.
arXiv · March 17, 2026 · 2603.14635
The Takeaway
Provides a practical blueprint for RAG pipelines: concentrating compute on deeper candidate pools and stronger re-ranking models yields significant performance gains, while complex query expansion strategies offer diminishing returns.
From the abstract
As agents operate over long horizons, their memory stores grow continuously, making retrieval critical to accessing relevant information. Many agent queries require reasoning-intensive retrieval, where the connection between query and relevant documents is implicit and requires inference to bridge. LLM-augmented pipelines address this through query expansion and candidate re-ranking, but introduce significant inference costs. We study computation allocation in reasoning-intensive retrieval pipel