Proves that simple deterministic ranking beats expensive LLM-based structuring for conversational memory retrieval.
arXiv · March 17, 2026 · 2603.15599
The Takeaway
It shows that the current industry trend of using LLMs to 'structure' memory at ingestion time is unnecessary. A fast, deterministic pipeline (NER + CrossEncoder) achieves 98.6% recall and uses 8.5x fewer tokens, drastically lowering the cost of long-term agent memory.
From the abstract
Recent conversational memory systems invest heavily in LLM-based structuring at ingestion time and learned retrieval policies at query time. We show that neither is necessary. SmartSearch retrieves from raw, unstructured conversation history using a fully deterministic pipeline: NER-weighted substring matching for recall, rule-based entity discovery for multi-hop expansion, and a CrossEncoder+ColBERT rank fusion stage -- the only learned component -- running on CPU in ~650ms. Oracle analysis on