AI & ML New Capability

Omnilingual MT scales machine translation to over 1,600 languages, an 8x increase in coverage over previous state-of-the-art systems.

arXiv · March 18, 2026 · 2603.16309

Omnilingual MT Team, Belen Alastruey, Niyati Bafna, Andrea Caciolai, Kevin Heffernan, Artyom Kozhevnikov, Christophe Ropers, Eduardo Sánchez, Charles-Eric Saint-James, Ioannis Tsiamas, Chierh Cheng, Joe Chuang, Paul-Ambroise Duquenne, Mark Duppenthaler, Nate Ekberg, Cynthia Gao, Pere Lluís Huguet Cabot, João Maria Janeiro, Jean Maillard, Gabriel Mejia Gonzalez, Holger Schwenk, Edan Toledo, Arina Turkatenko, Albert Ventayol-Boada, Rashel Moritz, Alexandre Mourachko, Surya Parimi, Mary Williamson, Shireen Yates, David Dale, Marta R. Costa-jussà

The Takeaway

This represents a massive leap in language coverage, particularly for under-supported languages, proving that LLM specialization can outperform 70B parameter models at much smaller scales (1B-8B). It provides a blueprint for supporting the 'long tail' of human languages previously ignored by ML.

From the abstract

High-quality machine translation (MT) can scale to hundreds of languages, setting a high bar for multilingual systems. However, compared to the world's 7,000 languages, current systems still offer only limited coverage: about 200 languages on the target side, and maybe a few hundreds more on the source side, supported due to cross-lingual transfer. And even these numbers have been hard to evaluate due to the lack of reliable benchmarks and metrics.We present Omnilingual Machine Translation (OMT)

Read the original paper →

← Back to today's papers