Provides empirical evidence that structural sparsity in Vision Transformers does not lead to improved semantic interpretability.
arXiv · March 18, 2026 · 2603.15919
The Takeaway
Contrary to common belief, pruning does not isolate simpler functional modules but merely redistributes computation across active nodes. This suggests that practitioners should not rely on weight sparsity as a proxy for model transparency or circuit simplicity.
From the abstract
Sparse neural networks are often hypothesized to be more interpretable than dense models, motivated by findings that weight sparsity can produce compact circuits in language models. However, it remains unclear whether structural sparsity itself leads to improved semantic interpretability. In this work, we systematically evaluate the relationship between weight sparsity and interpretability in Vision Transformers using DeiT-III B/16 models pruned with Wanda. To assess interpretability comprehensi