AI & ML Efficiency Breakthrough

Reduces human annotation requirements for NLP model testing by up to 95%.

March 24, 2026

Original Paper

Select, Label, Evaluate: Active Testing in NLP

Antonio Purificato, Maria Sofia Bucarelli, Andrea Bacciu, Amin Mantrach, Fabrizio Silvestri

arXiv · 2603.21840

The Takeaway

By formalizing 'Active Testing,' the paper provides a framework to select only the most informative samples for evaluation. This allows practitioners to estimate model performance with high reliability (within 1%) while drastically reducing the cost of high-quality test set annotation.

From the abstract

Human annotation cost and time remain significant bottlenecks in Natural Language Processing (NLP), with test data annotation being particularly expensive due to the stringent requirement for low-error and high-quality labels necessary for reliable model evaluation. Traditional approaches require annotating entire test sets, leading to substantial resource requirements. Active Testing is a framework that selects the most informative test samples for annotation. Given a labeling budget, it aims t