Decoupled language models reduce the compute required for OCR domain adaptation by 95% while matching SOTA transformer accuracy.
March 31, 2026
Original Paper
Efficient Domain Adaptation for Text Line Recognition via Decoupled Language Models
arXiv · 2603.28028
The Takeaway
By separating visual character detection from linguistic correction, this framework allows for annotation-free adaptation to historical or specialized documents on a single GPU. It democratizes high-performance OCR for practitioners who cannot afford hundreds of GPU hours for end-to-end training.
From the abstract
Optical character recognition remains critical infrastructure for document digitization, yet state-of-the-art performance is often restricted to well-resourced institutions by prohibitive computational barriers. End-to-end transformer architectures achieve strong accuracy but demand hundreds of GPU hours for domain adaptation, limiting accessibility for practitioners and digital humanities scholars. We present a modular detection-and-correction framework that achieves near-SOTA accuracy with sin