DebugLM allows developers to trace an LLM's specific behaviors back to individual training data sources.
arXiv · March 19, 2026 · 2603.17884
The Takeaway
By embedding provenance tags into the model's training, it enables precise debugging and targeted test-time remediation (like selective refusal) without retraining. This solves a major observability gap in the multi-stage pipelines used to build foundation models.
From the abstract
Large language models (LLMs) are trained through multi-stage pipelines over heterogeneous data sources, yet developers lack a principled way to pinpoint the specific data responsible for an observed behavior. This lack of observability reduces debugging to reactive patching and makes failures prone to recur under distribution shift or subsequent model updates. To address this limitation, we propose DebugLM, a framework that equips LLMs with built-in data provenance, enabling them to explicitly t