AI & ML Efficiency Breakthrough

Reveals that Graph-RAG performance is limited by reasoning failure rather than retrieval, and shows how to make an 8B model match a 70B baseline.

March 17, 2026

Original Paper

The Reasoning Bottleneck in Graph-RAG: Structured Prompting and Context Compression for Multi-Hop QA

Yasaman Zarinkia, Venkatesh Srinivasan, Alex Thomo

arXiv · 2603.14045

The Takeaway

It identifies that 70-90% of Graph-RAG errors are reasoning failures and solves this using SPARQL CoT and 60% context compression. Practically, this allows developers to achieve frontier-level multi-hop QA results using significantly smaller, cheaper models at 12x lower cost.

From the abstract

Graph-RAG systems achieve strong multi-hop question answering by indexing documents into knowledge graphs, but strong retrieval does not guarantee strong answers. Evaluating KET-RAG, a leading Graph-RAG system, on three multi-hop QA benchmarks (HotpotQA, MuSiQue, 2WikiMultiHopQA), we find that 77% to 91% of questions have the gold answer in the retrieved context, yet accuracy is only 35% to 78%, and 73% to 84% of errors are reasoning failures. We propose two augmentations: (i) SPARQL chain-of-th