Mamba-3 introduces MIMO formulations and complex-valued updates to solve the state-tracking failures of previous linear models.
arXiv · March 17, 2026 · 2603.15569
The Takeaway
Standard SSMs struggle with retrieval and state-tracking; this architecture refinement fixes those issues while maintaining sub-quadratic efficiency. It represents a significant step toward making non-Transformer models competitive for complex reasoning tasks.
From the abstract
Scaling inference-time compute has emerged as an important driver of LLM performance, making inference efficiency a central focus of model design alongside model quality. While the current Transformer-based models deliver strong model quality, their quadratic compute and linear memory make inference expensive. This has spurred the development of sub-quadratic models with reduced linear compute and constant memory requirements. However, many recent linear models trade off model quality and capabi