AI & ML New Capability

LLM agents can now autonomously re-identify anonymous individuals by combining sparse, non-identifying cues with public data.

arXiv · March 20, 2026 · 2603.18382

Myeongseob Ko, Jihyun Jeong, Sumiran Singh Thakur, Gyuhak Kim, Ruoxi Jia

The Takeaway

This represents a major privacy shift, as LLMs can execute de-anonymization (e.g., matching Netflix Prize records with 79% accuracy) without bespoke engineering. It forces a move from 'data anonymization' to 'inference-driven' privacy risk assessments.

From the abstract

Anonymization is widely treated as a practical safeguard because re-identifying anonymous records was historically costly, requiring domain expertise, tailored algorithms, and manual corroboration. We study a growing privacy risk that may weaken this barrier: LLM-based agents can autonomously reconstruct real-world identities from scattered, individually non-identifying cues. By combining these sparse cues with public information, agents resolve identities without bespoke engineering. We formali

Read the original paper →

← Back to today's papers