Spectral Compact Training (SCT) enables training 70B-parameter architectures on consumer hardware like the Steam Deck (8GB RAM) via permanent SVD factors.
April 2, 2026
Original Paper
Spectral Compact Training: Pre-Training Large Language Models via Permanent Truncated SVD and Stiefel QR Retraction
arXiv · 2604.00733
The Takeaway
By never materializing dense matrices and instead training on Stiefel manifolds, it achieves a ~200x memory reduction for MLP layers. This shifts the bottleneck of LLM training from memory capacity to learning rate schedules, enabling massive model experimentation on commodity GPUs.
From the abstract
The memory wall remains the primary bottleneck for training large language models on consumer hardware. We introduce Spectral Compact Training (SCT), a method that replaces dense weight matrices with permanent truncated SVD factors W = U diag(s) V^T, where the full dense matrix is never materialized during training or inference. Gradients flow through the compact spectral factors via standard backpropagation, and U, V are retracted to the Stiefel manifold via QR decomposition after each optimize