Uses an asymmetric Draft-Verify-Recover pipeline to enable high-quality personalized AI assistants without compromising user privacy.
arXiv · March 18, 2026 · 2603.16219
The Takeaway
SpecSteer allows on-device small models to handle private user history while cloud models verify logical reasoning through a modified speculative decoding protocol. This provides a 2.36x speedup and allows personalized intelligence that would otherwise be too computationally expensive for local devices or too private for the cloud.
From the abstract
Realizing personalized intelligence faces a core dilemma: sending user history to centralized large language models raises privacy concerns, while on-device small language models lack the reasoning capacity required for high-quality generation. Our pilot study shows that purely local enhancements remain insufficient to reliably bridge this gap. We therefore propose SpecSteer, an asymmetric collaborative inference framework that synergizes private on-device context with cloud-scale reasoning. Spe