AI & ML Paradigm Shift

Introduces Dual Q-DM, the first non-adversarial imitation learning method theoretically guaranteed to eliminate compounding errors.

March 25, 2026

Original Paper

Non-Adversarial Imitation Learning Provably Free of Compounding Errors: The Role of Bellman Constraints

Tian Xu, Chenyang Wang, Xiaochen Zhai, Ziniu Li, Yi-Chen Li, Yang Yu

arXiv · 2603.22713

The Takeaway

It challenges the assumption that existing non-adversarial methods (like IQ-Learn) naturally outperform behavioral cloning by proving they still suffer from quadratic error growth. The new Bellman constraint mechanism allows models to generalize to unvisited states with linear error growth without the instability of adversarial training.

From the abstract

Adversarial imitation learning (AIL) achieves high-quality imitation by mitigating compounding errors in behavioral cloning (BC), but often exhibits training instability due to adversarial optimization. To avoid this issue, a class of non-adversarial Q-based imitation learning (IL) methods, represented by IQ-Learn, has emerged and is widely believed to outperform BC by leveraging online environment interactions. However, this paper revisits IQ-Learn and demonstrates that it provably reduces to B

Read the original paper →

← Back to today's papers