AI & ML Breaks Assumption

An empirical study reveals that models under 7B parameters have a fundamental utilization bottleneck that prevents them from using retrieved context effectively.

arXiv · March 13, 2026 · 2603.11513

Sanchit Pandey

Why it matters

The paper shows that even with 'oracle' retrieval (answer guaranteed to be in context), SLMs fail 85-100% of the time on questions they didn't already know. This challenges the common assumption that RAG is a viable shortcut for giving small models new knowledge without fine-tuning.

From the abstract

Retrieval augmented generation RAG is widely deployed to improve factual accuracy in language models yet it remains unclear whether smaller models of size 7B parameters or less can effectively utilize retrieved information. To investigate this question we evaluate five model sizes from 360M to 8B across three architecture families SmolLM2 Qwen2.5 and Llama 3.1 under four retrieval conditions including no retrieval BM25 dense retrieval using E5 large v2 and oracle retrieval where the retrieved pa