AI & ML Scaling Insight

A billion-scale time-series benchmark that identifies a 'context-length crossover' where foundation models start to crush deep learning baselines.

March 30, 2026

Original Paper

QuitoBench: A High-Quality Open Time Series Forecasting Benchmark

Siqiao Xue, Zhaoyang Zhu, Wei Zhang, Rongyao Cai, Rui Wang, Yixiang Mu, Fan Zhou, Jianguo Li, Peng Di, Hang Yu

arXiv · 2603.26017

The Takeaway

The benchmark reveals that for short-horizon forecasting (L=96), small deep learning models are more efficient, but foundation models scale exponentially better with context length (L>576). It provides a regime-aware map for practitioners to choose between architecture types based on data characteristics.

From the abstract

Time series forecasting is critical across finance, healthcare, and cloud computing, yet progress is constrained by a fundamental bottleneck: the scarcity of large-scale, high-quality benchmarks. To address this gap, we introduce \textsc{QuitoBench}, a regime-balanced benchmark for time series forecasting with coverage across eight trend$\times$seasonality$\times$forecastability (TSF) regimes, designed to capture forecasting-relevant properties rather than application-defined domain labels. The