AI & ML Open Release

An open release of a multilingual embedding family (80M to 14B) covering 200+ languages and ranking first on 11 MTEB benchmarks.

March 20, 2026

Original Paper

F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual World

Ziyin Zhang, Zihan Liao, Hang Yu, Peng Di, Rui Wang

arXiv · 2603.19223

The Takeaway

It democratizes high-performance, general-purpose embeddings for low-resource languages and provides a full pipeline (weights, data, code) for production-grade retrieval and RAG systems across various model sizes.

From the abstract

We present F2LLM-v2, a new family of general-purpose, multilingual embedding models in 8 distinct sizes ranging from 80M to 14B. Trained on a newly curated composite of 60 million publicly available high-quality data samples, F2LLM-v2 supports more than 200 languages, with a particular emphasis on previously underserved mid- and low-resource languages. By integrating a two-stage LLM-based embedding training pipeline with matryoshka learning, model pruning, and knowledge distillation techniques,

Read the original paper →

← Back to today's papers