AI & ML Breaks Assumption

Machine unlearning in LLMs is often a 'mirage' that can be bypassed using simple multi-hop reasoning or entity aliasing.

arXiv · March 13, 2026 · 2603.11266

Raj Sanjay Shah, Jing Huang, Keerthiram Murugesan, Nathalie Baracaldo, Diyi Yang

Why it matters

This work exposes the brittleness of current safety and privacy unlearning techniques. It shows that 'forgotten' knowledge remains accessible through alternative computation pathways, necessitating more robust evaluation beyond static Q&A benchmarks.

From the abstract

Unlearning in Large Language Models (LLMs) aims to enhance safety, mitigate biases, and comply with legal mandates, such as the right to be forgotten. However, existing unlearning methods are brittle: minor query modifications, such as multi-hop reasoning and entity aliasing, can recover supposedly forgotten information. As a result, current evaluation metrics often create an illusion of effectiveness, failing to detect these vulnerabilities due to reliance on static, unstructured benchmarks. We