AI & ML New Capability

A white-box membership inference attack using 'gradient-induced feature drift' to outperform all existing confidence-based methods.

April 2, 2026

Original Paper

G-Drift MIA: Membership Inference via Gradient-Induced Feature Drift in LLMs

Ravi Ranjan, Utkarsh Grover, Xiaomin Lin, Agoritsa Polyzou

arXiv · 2604.00419

The Takeaway

By applying a single gradient-ascent step and measuring the 'drift' in internal representations, practitioners can audit whether specific data was used in training with much higher accuracy than loss-based attacks, even when distributions are identical.

From the abstract

Large language models (LLMs) are trained on massive web-scale corpora, raising growing concerns about privacy and copyright. Membership inference attacks (MIAs) aim to determine whether a given example was used during training. Existing LLM MIAs largely rely on output probabilities or loss values and often perform only marginally better than random guessing when members and non-members are drawn from the same distribution. We introduce G-Drift MIA, a white-box membership inference method based o