AI & ML Breaks Assumption

Demonstrates that simply using XML tags during translation outperforms complex pipelines for cross-lingual label projection while actually improving translation quality.

arXiv · March 13, 2026 · 2603.12021

Thennal D K, Chris Biemann, Hans Ole Hatzel

Why it matters

It challenges the consensus that joint translation and label projection degrades performance. Practitioners can now replace multi-step cross-lingual transfer pipelines with a simpler, more accurate fine-tuning approach using basic XML markers.

From the abstract

Label projection is an effective technique for cross-lingual transfer, extending span-annotated datasets from a high-resource language to low-resource ones. Most approaches perform label projection as a separate step after machine translation, and prior work that combines the two reports degraded translation quality. We re-evaluate this claim with LabelPigeon, a novel framework that jointly performs translation and label projection via XML tags. We design a direct evaluation scheme for label pro