AI & ML Breaks Assumption

Resolves a long-standing open problem in bandit theory by achieving optimal dynamic regret without knowing the number of environment switches.

March 30, 2026

Original Paper

Parameter-Free Dynamic Regret for Unconstrained Linear Bandits

Alberto Rumi, Andrew Jacobsen, Nicolò Cesa-Bianchi, Fabio Vitale

arXiv · 2603.25916

The Takeaway

Achieving optimal regret in unconstrained adversarial settings without prior knowledge of environment volatility (S_T) was considered a major theoretical bottleneck. This enables more robust, self-tuning online learning algorithms.

From the abstract

We study dynamic regret minimization in unconstrained adversarial linear bandit problems. In this setting, a learner must minimize the cumulative loss relative to an arbitrary sequence of comparators $\boldsymbol{u}_1,\ldots,\boldsymbol{u}_T$ in $\mathbb{R}^d$, but receives only point-evaluation feedback on each round. We provide a simple approach to combining the guarantees of several bandit algorithms, allowing us to optimally adapt to the number of switches $S_T = \sum_t\mathbb{I}\{\boldsymbo