A new method for training axis-aligned decision trees using gradient descent and backpropagation, allowing trees to be integrated into end-to-end neural networks.
arXiv · March 13, 2026 · 2603.11117
Why it matters
Historically, trees were trained via greedy algorithms (CART/XGBoost) and couldn't be optimized with NNs. This differentiable tree approach achieves SOTA results and allows practitioners to use the interpretability of decision trees inside larger deep learning pipelines for multimodal or RL tasks.
From the abstract
Tree-based models are widely recognized for their interpretability and have proven effective in various application domains, particularly in high-stakes domains. However, learning decision trees (DTs) poses a significant challenge due to their combinatorial complexity and discrete, non-differentiable nature. As a result, traditional methods such as CART, which rely on greedy search procedures, remain the most widely used approaches. These methods make locally optimal decisions at each node, cons