AI & ML Efficiency Breakthrough

Spectral Compact Training (SCT) enables training 70B-parameter architectures on consumer hardware like the Steam Deck (8GB RAM) via permanent SVD factors.

April 2, 2026

Original Paper

Spectral Compact Training: Pre-Training Large Language Models via Permanent Truncated SVD and Stiefel QR Retraction

Björn Roman Kohlberger

arXiv · 2604.00733

The Takeaway

By never materializing dense matrices and instead training on Stiefel manifolds, it achieves a ~200x memory reduction for MLP layers. This shifts the bottleneck of LLM training from memory capacity to learning rate schedules, enabling massive model experimentation on commodity GPUs.

From the abstract

The memory wall remains the primary bottleneck for training large language models on consumer hardware. We introduce Spectral Compact Training (SCT), a method that replaces dense weight matrices with permanent truncated SVD factors W = U diag(s) V^T, where the full dense matrix is never materialized during training or inference. Gradients flow through the compact spectral factors via standard backpropagation, and U, V are retracted to the Stiefel manifold via QR decomposition after each optimize

Read the original paper →

← Back to today's papers