AI & ML Breaks Assumption

Demonstrates that the two standard mathematical interpretations of Temporal Difference (TD) error diverge in deep reinforcement learning.

March 24, 2026

Original Paper

Deep Reinforcement Learning and The Tale of Two Temporal Difference Errors

Juan Sebastian Rojas, Chi-Guhn Lee

arXiv · 2603.21921

The Takeaway

The paper reveals that 'temporally successive predictions' and 'bootstrapped targets' are not numerically equivalent in non-linear architectures, a fundamental assumption in RL since 1988. This has immediate implications for deep RL algorithms that use TD error for auxiliary tasks like priority sampling or differential reward estimation.

From the abstract

The temporal difference (TD) error was first formalized in Sutton (1988), where it was first characterized as the difference between temporally successive predictions, and later, in that same work, formulated as the difference between a bootstrapped target and a prediction. Since then, these two interpretations of the TD error have been used interchangeably in the literature, with the latter eventually being adopted as the standard critic loss in deep reinforcement learning (RL) architectures. I