Provides empirical evidence and a mechanistic explanation for why LoRA drastically reduces catastrophic forgetting in sequential fine-tuning compared to full fine-tuning.
March 31, 2026
Original Paper
Low-Rank Adaptation Reduces Catastrophic Forgetting in Sequential Transformer Encoder Fine-Tuning: Controlled Empirical Evidence and Frozen-Backbone Representation Probes
arXiv · 2603.27707
The Takeaway
Quantifies that LoRA (r=8) reduces forgetting to ~0.6% compared to ~20% for full fine-tuning. For practitioners, this justifies using LoRA not just for compute savings, but as a primary strategy for preserving base model capabilities during multi-task sequential adaptation.
From the abstract
Sequential fine-tuning of pretrained language encoders often overwrites previously acquired capabilities, but the forgetting behavior of parameter-efficient updates remains under-characterized. We present a controlled empirical study of Low-Rank Adaptation (LoRA) in sequential transformer encoder fine-tuning with companion representation probes that test a frozen-backbone explanation of its robustness. In five full-validation BERT-base reruns on an RTE->MRPC->CoLA->SST-2 sequence, full fine-tuni