Cyber-attack capabilities of AI models scale log-linearly with inference-time compute, with no plateau in sight.
arXiv · March 13, 2026 · 2603.11214
Why it matters
It demonstrates that increasing inference-time compute (up to 100M tokens) can improve success rates by 59% in complex, multi-step cyber attacks. This suggests that safety evaluations based on fixed token budgets may significantly underestimate the risk of near-future models.
From the abstract
We evaluate the autonomous cyber-attack capabilities of frontier AI models on two purpose-built cyber ranges-a 32-step corporate network attack and a 7-step industrial control system attack-that require chaining heterogeneous capabilities across extended action sequences. By comparing seven models released over an eighteen-month period (August 2024 to February 2026) at varying inference-time compute budgets, we observe two capability trends. First, model performance scales log-linearly with infe