SAGE achieves state-of-the-art translation for low-resource languages while reducing training data requirements by 97.1% via RL-guided curation.
March 23, 2026
Original Paper
SAGE: Sustainable Agent-Guided Expert-tuning for Culturally Attuned Translation in Low-Resource Southeast Asia
arXiv · 2603.19931
The Takeaway
It shifts the focus from 'big data' to 'right data' for culturally attuned translation. By using an agent to autonomously curate high-quality expert dialogues, it democratizes LLM fine-tuning for communities where massive datasets are unavailable or too expensive to process.
From the abstract
The vision of an inclusive World Wide Web is impeded by a severe linguistic divide, particularly for communities in low-resource regions of Southeast Asia. While large language models (LLMs) offer a potential solution for translation, their deployment in data-poor contexts faces a dual challenge: the scarcity of high-quality, culturally relevant data and the prohibitive energy costs of training on massive, noisy web corpora. To resolve the tension between digital inclusion and environmental sust