LLM agents can now autonomously re-identify anonymous individuals by combining sparse, non-identifying cues with public data.
arXiv · March 20, 2026 · 2603.18382
The Takeaway
This represents a major privacy shift, as LLMs can execute de-anonymization (e.g., matching Netflix Prize records with 79% accuracy) without bespoke engineering. It forces a move from 'data anonymization' to 'inference-driven' privacy risk assessments.
From the abstract
Anonymization is widely treated as a practical safeguard because re-identifying anonymous records was historically costly, requiring domain expertise, tailored algorithms, and manual corroboration. We study a growing privacy risk that may weaken this barrier: LLM-based agents can autonomously reconstruct real-world identities from scattered, individually non-identifying cues. By combining these sparse cues with public information, agents resolve identities without bespoke engineering. We formali