Concept

Time Series Forecasting

The task of predicting future values of a sequence by learning patterns in historical data - applied everywhere from weather prediction to stock prices to energy demand to server load forecasting.

Added May 18, 2026

A time series is any sequence of observations indexed in time: daily sales, hourly electricity demand, monthly unemployment, second-by-second stock prices, annual rainfall, real-time server CPU usage. Time series forecasting predicts future values based on historical patterns - one of the most universally applicable ML tasks, with applications in virtually every industry.

Classical statistical methods dominated time series for decades. ARIMA (AutoRegressive Integrated Moving Average) models the current value as a linear function of recent past values and recent forecast errors. Exponential smoothing (Holt-Winters) weights recent observations more heavily with exponentially decreasing weights. Seasonal decomposition separates a time series into trend, seasonality, and residual components. Facebook's Prophet model packages a generalised additive model with automatic seasonality and changepoint detection into a user-friendly package suitable for business forecasting.

Neural network approaches to time series began with RNNs and LSTMs, which model sequential dependencies by maintaining a hidden state that captures information from prior time steps. LSTMs addressed the vanishing gradient problem that made vanilla RNNs ineffective for long-range dependencies. Temporal Convolutional Networks (TCN) apply dilated causal convolutions for efficient long-range temporal modelling with parallelisable computation. WaveNet (DeepMind, originally for audio) demonstrated that dilated convolutions could capture patterns at multiple timescales efficiently.

Transformer-based models have become dominant for time series following their success in NLP. Informer, Autoformer, FEDformer, and PatchTST adapt the Transformer architecture for time series, handling the computational challenge of applying full attention to long sequences. PatchTST divides the time series into patches (analogous to image patches in vision Transformers) and applies attention over patch tokens, achieving state-of-the-art results on long-horizon forecasting benchmarks.

N-BEATS and N-HiTS are deep learning architectures specifically designed for time series without the inductive biases of recurrent or attention architectures, using residual stacking and hierarchical interpolation. TimesFM and Moirai are recent foundation model approaches: trained on billions of time series points from diverse domains, they achieve strong zero-shot forecasting on new time series without task-specific fine-tuning.

Probabilistic forecasting extends point forecasts to full predictive distributions, quantifying uncertainty. DeepAR (Amazon) trains autoregressive models that output distribution parameters at each step. Normalising flow-based models and diffusion models for time series generate full probabilistic forecast trajectories.

Key challenges: non-stationarity (the statistical properties of the series change over time), multiple seasonalities (daily, weekly, annual patterns superimposed), irregular sampling, missing data, and distributional shift when exogenous events (economic shocks, policy changes) alter the underlying data generating process.

Analogy

Weather forecasting. A meteorologist combines knowledge of historical weather patterns (statistical regularities), current atmospheric measurements (present state), and physical models of how weather systems evolve (dynamics) to predict tomorrow's weather with a probability estimate. Time series forecasting does the same for any sequence: it extracts historical patterns, incorporates current state, and projects forward, ideally with an uncertainty estimate that reflects how far ahead the forecast horizon extends.

Real-world example

DeepAR at Amazon forecasts demand for millions of products across all Amazon marketplaces simultaneously. Traditional methods would require fitting separate models for each product - infeasible at this scale. DeepAR is trained globally across all products and can learn shared patterns (Black Friday demand spikes, seasonal effects, product lifecycle patterns) while also conditioning on each product's own history. A single trained model generates probabilistic forecasts for cold-start products (new items with no history) by leveraging patterns from similar items.

Why it matters

Time series forecasting is one of the most economically consequential ML applications: accurate demand forecasting reduces inventory costs and stockouts, accurate energy load forecasting enables grid efficiency and renewable integration, accurate financial forecasting informs investment decisions. Understanding the methods - from classical ARIMA to deep learning Transformers to probabilistic forecasting - is essential for data scientists in virtually any industry vertical.

In the news

No recent coverage - check back later.

Related concepts

Anomaly Detection Model Monitoring

← Back to concepts