AI & ML Breaks Assumption

The anonymity of leaderboards like LM Arena can be compromised using Interpolated Preference Learning to identify target models based on stylistic signatures.

arXiv · March 17, 2026 · 2603.15220

Minsung Cho, Jaehyung Kim

The Takeaway

This exposes a fundamental security vulnerability in human-voting-based leaderboards, proving that 'blind' tests can be gamed or de-anonymized. It forces the community to rethink how we maintain the integrity of competitive LLM evaluations.

From the abstract

Strict anonymity of model responses is a key for the reliability of voting-based leaderboards, such as LM Arena. While prior studies have attempted to compromise this assumption using simple statistical features like TF-IDF or bag-ofwords, these methods often lack the discriminative power to distinguish between stylistically similar or within-family models. To overcome these limitations and expose the severity of vulnerability, we introduce INTERPOL, a model-driven identification framework that

Read the original paper →

← Back to today's papers