A widely accepted mathematical assumption used to speed up AI decision-making algorithms has just been proven wrong.
April 24, 2026
Original Paper
Revisiting Subgradient Dominance in Robust MDPs: Counterexamples, Hardness, and Sufficient Conditions
arXiv · 2604.21177
The Takeaway
The theory of Robust Markov Decision Processes relied on a concept called subgradient dominance to guarantee efficient performance. This proof provides counterexamples that show this logic is fundamentally flawed and that finding an optimal policy is actually much harder than previously thought. The problem is NP-hard, meaning there are no easy shortcuts to the best answer in these systems. Many current algorithms for AI planning in uncertain environments are built on this shaky foundation. This discovery forces researchers to redesign the way we approach decision-making in complex, unpredictable worlds. We must find new ways to ensure that AI plans remain safe and efficient.
From the abstract
Projected subgradient descent (PSD) has gained popularity for solving robust Markov decision processes (RMDPs) because it applies to a broader class of uncertainty sets than traditional dynamic programming. Existing work claims that RMDPs with a general compact uncertainty set satisfy the subgradient dominance property, under which exact PSD converges to an $\varepsilon$-optimal policy in a polynomial number of updates (e.g., Wang et al., 2023). We show that these claims are incorrect. Even when