AI & ML Scaling Insight

Provides empirical evidence and a mechanistic explanation for why LoRA drastically reduces catastrophic forgetting in sequential fine-tuning compared to full fine-tuning.

March 31, 2026

Original Paper

Low-Rank Adaptation Reduces Catastrophic Forgetting in Sequential Transformer Encoder Fine-Tuning: Controlled Empirical Evidence and Frozen-Backbone Representation Probes

Ashish Pandey

arXiv · 2603.27707

The Takeaway

Quantifies that LoRA (r=8) reduces forgetting to ~0.6% compared to ~20% for full fine-tuning. For practitioners, this justifies using LoRA not just for compute savings, but as a primary strategy for preserving base model capabilities during multi-task sequential adaptation.

From the abstract

Sequential fine-tuning of pretrained language encoders often overwrites previously acquired capabilities, but the forgetting behavior of parameter-efficient updates remains under-characterized. We present a controlled empirical study of Low-Rank Adaptation (LoRA) in sequential transformer encoder fine-tuning with companion representation probes that test a frozen-backbone explanation of its robustness. In five full-validation BERT-base reruns on an RTE->MRPC->CoLA->SST-2 sequence, full fine-tuni

Read the original paper →

← Back to today's papers