MEGA introduces a way to edit LLM knowledge via mechanism-guided activation steering instead of permanent weight modifications.
March 24, 2026
Original Paper
The Anatomy of an Edit: Mechanism-Guided Activation Steering for Knowledge Editing
arXiv · 2603.20795
The Takeaway
Most knowledge editing methods modify model weights, which can lead to catastrophic forgetting or instability. By using post-edit attribution to guide activation steering, this method enables reliable, architecture-agnostic edits that can be easily applied or reversed without re-training.
From the abstract
Large language models (LLMs) are increasingly used as knowledge bases, but keeping them up to date requires targeted knowledge editing (KE). However, it remains unclear how edits are implemented inside the model once applied. In this work, we take a mechanistic view of KE using neuron-level knowledge attribution (NLKA). Unlike prior work that focuses on pre-edit causal tracing and localization, we use post-edit attribution -- contrasting successful and failed edits -- to isolate the computations