AI & ML Breaks Assumption

Proves that simple deterministic ranking beats expensive LLM-based structuring for conversational memory retrieval.

arXiv · March 17, 2026 · 2603.15599

Jesper Derehag, Carlos Calva, Timmy Ghiurau

The Takeaway

It shows that the current industry trend of using LLMs to 'structure' memory at ingestion time is unnecessary. A fast, deterministic pipeline (NER + CrossEncoder) achieves 98.6% recall and uses 8.5x fewer tokens, drastically lowering the cost of long-term agent memory.

From the abstract

Recent conversational memory systems invest heavily in LLM-based structuring at ingestion time and learned retrieval policies at query time. We show that neither is necessary. SmartSearch retrieves from raw, unstructured conversation history using a fully deterministic pipeline: NER-weighted substring matching for recall, rule-based entity discovery for multi-hop expansion, and a CrossEncoder+ColBERT rank fusion stage -- the only learned component -- running on CPU in ~650ms. Oracle analysis on

Read the original paper →

← Back to today's papers