Reveals that models with identical predictive performance produce fundamentally different feature attributions based solely on their hypothesis class.
arXiv · March 18, 2026 · 2603.15821
The Takeaway
This 'Explanation Lottery' challenges the reliability of XAI for auditing and regulation, showing that model selection is not explanation-neutral. It introduces a diagnostic score to identify when explanations are actually stable across different architectures.
From the abstract
The assumption that prediction-equivalent models produce equivalent explanations underlies many practices in explainable AI, including model selection, auditing, and regulatory evaluation. In this work, we show that this assumption does not hold. Through a large-scale empirical study across 24 datasets and multiple model classes, we find that models with identical predictive behavior can produce substantially different feature attributions. This disagreement is highly structured: models within t