AI & ML New Capability

An autonomous agentic pipeline discovered novel white-box adversarial attacks that outperform existing methods by up to 300%.

March 26, 2026

Original Paper

Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs

Alexander Panfilov, Peter Romov, Igor Shilov, Yves-Alexandre de Montjoye, Jonas Geiping, Maksym Andriushchenko

arXiv · 2603.24511

The Takeaway

This demonstrates that safety and security research can be significantly automated. The discovered algorithms achieve 100% success rates against highly aligned models like Meta-SecAlign-70B, suggesting that current human-designed jailbreak defenses are systematically vulnerable to automated red-teaming.

From the abstract

LLM agents like Claude Code can not only write code but also be used for autonomous AI research and engineering \citep{rank2026posttrainbench, novikov2025alphaevolve}. We show that an \emph{autoresearch}-style pipeline \citep{karpathy2026autoresearch} powered by Claude Code discovers novel white-box adversarial attack \textit{algorithms} that \textbf{significantly outperform all existing (30+) methods} in jailbreaking and prompt injection evaluations.Starting from existing attack implementations