TimeSqueeze achieves 20x faster convergence and 8x higher data efficiency for time-series foundation models by using dynamic, content-aware patching.
arXiv · March 13, 2026 · 2603.11352
Why it matters
It solves the fundamental trade-off between temporal fidelity and computational cost in time-series Transformers. By adaptively allocating tokens based on signal complexity, it enables much more efficient pre-training of large-scale temporal models.
From the abstract
Transformer-based time series foundation models face a fundamental trade-off in choice of tokenization: point-wise embeddings preserve temporal fidelity but scale poorly with sequence length, whereas fixed-length patching improves efficiency by imposing uniform boundaries that may disrupt natural transitions and blur informative local dynamics. In order to address these limitations, we introduce TimeSqueeze, a dynamic patching mechanism that adaptively selects patch boundaries within each sequen