Enables training of CNNs from scratch in true 4-bit precision on commodity CPUs with virtually no loss in accuracy.
March 17, 2026
Original Paper
True 4-Bit Quantized Convolutional Neural Network Training on CPU: Achieving Full-Precision Parity
arXiv · 2603.13931
The Takeaway
This work demonstrates 8x memory compression while matching full-precision performance on standard benchmarks. It significantly democratizes deep learning by allowing practitioners to train robust models on consumer-grade hardware like mobile phones or free-tier CPU instances.
From the abstract
Low-precision neural network training has emerged as a promising direction for reducing computational costs and democratizing access to deep learning research. However, existing 4-bit quantization methods either rely on expensive GPU infrastructure or suffer from significant accuracy degradation. In this work, we present a practical method for training convolutional neural networks at true 4-bit precision using standard PyTorch operations on commodity CPUs. We introduce a novel tanh-based soft w