Introduces Dual Q-DM, the first non-adversarial imitation learning method theoretically guaranteed to eliminate compounding errors.
March 25, 2026
Original Paper
Non-Adversarial Imitation Learning Provably Free of Compounding Errors: The Role of Bellman Constraints
arXiv · 2603.22713
The Takeaway
It challenges the assumption that existing non-adversarial methods (like IQ-Learn) naturally outperform behavioral cloning by proving they still suffer from quadratic error growth. The new Bellman constraint mechanism allows models to generalize to unvisited states with linear error growth without the instability of adversarial training.
From the abstract
Adversarial imitation learning (AIL) achieves high-quality imitation by mitigating compounding errors in behavioral cloning (BC), but often exhibits training instability due to adversarial optimization. To avoid this issue, a class of non-adversarial Q-based imitation learning (IL) methods, represented by IQ-Learn, has emerged and is widely believed to outperform BC by leveraging online environment interactions. However, this paper revisits IQ-Learn and demonstrates that it provably reduces to B