AI & ML Efficiency Breakthrough

Sparton is a specialized Triton kernel that solves the massive memory bottleneck of Learned Sparse Retrieval (LSR) models like Splade.

March 27, 2026

Original Paper

Sparton: Fast and Memory-Efficient Triton Kernel for Learned Sparse Retrieval

Thong Nguyen, Cosimo Rulli, Franco Maria Nardini, Rossano Venturini, Andrew Yates

arXiv · 2603.25011

The Takeaway

It achieves a 4.8x speedup and a 10x reduction in memory by performing early online reduction directly on logit tiles. This allows practitioners to scale LSR models to massive vocabularies and larger batch sizes that were previously hardware-prohibitive.

From the abstract

State-of-the-art Learned Sparse Retrieval (LSR) models, such as Splade, typically employ a Language Modeling (LM) head to project latent hidden states into a lexically-anchored logit matrix. This intermediate matrix is subsequently transformed into a sparse lexical representation through element-wise operations (ReLU, Log1P) and max-pooling over the sequence dimension. Despite its effectiveness, the LM head creates a massive memory bottleneck due to the sheer size of the vocabulary (V), which ca

Read the original paper →

← Back to today's papers