Reveals that Graph-RAG performance is limited by reasoning failure rather than retrieval, and shows how to make an 8B model match a 70B baseline.
March 17, 2026
Original Paper
The Reasoning Bottleneck in Graph-RAG: Structured Prompting and Context Compression for Multi-Hop QA
arXiv · 2603.14045
The Takeaway
It identifies that 70-90% of Graph-RAG errors are reasoning failures and solves this using SPARQL CoT and 60% context compression. Practically, this allows developers to achieve frontier-level multi-hop QA results using significantly smaller, cheaper models at 12x lower cost.
From the abstract
Graph-RAG systems achieve strong multi-hop question answering by indexing documents into knowledge graphs, but strong retrieval does not guarantee strong answers. Evaluating KET-RAG, a leading Graph-RAG system, on three multi-hop QA benchmarks (HotpotQA, MuSiQue, 2WikiMultiHopQA), we find that 77% to 91% of questions have the gold answer in the retrieved context, yet accuracy is only 35% to 78%, and 73% to 84% of errors are reasoning failures. We propose two augmentations: (i) SPARQL chain-of-th