AI & ML Efficiency Breakthrough

Enables Gaussian Processes to scale on modern parallel hardware by removing the need for Cholesky decompositions.

April 2, 2026

Original Paper

Inverse-Free Sparse Variational Gaussian Processes

Stefano Cortinovis, Laurence Aitchison, Stefanos Eleftheriadis, Mark van der Wilk

arXiv · 2604.00697

The Takeaway

Standard Sparse Variational GPs rely on matrix inversions/decompositions that are poorly suited for GPUs and TPUs. By deriving a matmul-only natural-gradient update, this method allows GPs to be trained as drop-in replacements in deep learning pipelines without the traditional computational bottlenecks, maintaining stability even at scale.

From the abstract

Gaussian processes (GPs) offer appealing properties but are costly to train at scale. Sparse variational GP (SVGP) approximations reduce cost yet still rely on Cholesky decompositions of kernel matrices, ill-suited to low-precision, massively parallel hardware. While one can construct valid variational bounds that rely only on matrix multiplications (matmuls) via an auxiliary matrix parameter, optimising them with off-the-shelf first-order methods is challenging. We make the inverse-free approac