Introduces the concept of a 'trainable' knowledge base for RAG that improves performance by distilling and writing back compact knowledge units.
March 27, 2026
Original Paper
Training the Knowledge Base through Evidence Distillation and Write-Back Enrichment
arXiv · 2603.25737
The Takeaway
Instead of treating the RAG corpus as a static entity, this method uses labeled data to identify successful retrievals and optimize the corpus itself as an offline preprocessing step. This makes it compatible with any existing RAG pipeline and model, providing a universal performance boost.
From the abstract
The knowledge base in a retrieval-augmented generation (RAG) system is typically assembled once and never revised, even though the facts a query requires are often fragmented across documents and buried in irrelevant content. We argue that the knowledge base should be treated as a trainable component and propose WriteBack-RAG, a framework that uses labeled examples to identify where retrieval succeeds, isolate the relevant documents, and distill them into compact knowledge units that are indexed