AI & ML Breaks Assumption

Shows that tool-augmented agents suffer from 'recommendation drift' where they provide unsafe advice under tool corruption while maintaining high ranking scores.

arXiv · March 16, 2026 · 2603.12564

Zekun Wu, Adriano Koshiyama, Sahan Bulathwela, Maria Perez-Ortiz

Why it matters

Standard evaluation metrics like NDCG mask safety failures in multi-turn agents. This paper proves that agents will confidently recommend risk-inappropriate products if tool outputs are even slightly biased, necessitating a shift toward trajectory-level safety monitoring rather than just output quality.

From the abstract

Tool-augmented LLM agents increasingly serve as multi-turn advisors in high-stakes domains, yet their evaluation relies on ranking-quality metrics that measure what is recommended but not whether it is safe for the user. We introduce a paired-trajectory protocol that replays real financial dialogues under clean and contaminated tool-output conditions across seven LLMs (7B to frontier) and decomposes divergence into information-channel and memory-channel mechanisms. Across the seven models tested

Read the original paper →

← Back to today's papers