Digital Mathematical Notebook

Time Series 📈

Time series are ordered observations whose temporal dependence changes what “data analysis” even means. Once order matters, we have to think about stochastic processes, lagged dependence, stationarity, forecast uncertainty, and model validation in a different way than for i.i.d. data.

This page builds from stochastic foundations to ACF and PACF, ARIMA and SARIMA, decomposition, ETS, state-space models, Kalman filtering, multivariate extensions, spectral ideas, and modern machine learning for sequential data.

process sample path forecast dependence state uncertainty ordered dependence is the object, not an inconvenience added after the fact
A time-series model has to separate signal, dependence, latent structure, and uncertainty while preserving temporal order throughout fitting and evaluation.

Why this matters

Forecasting

Energy demand, sales, inventory, traffic, and climate planning all rely on dependence-aware prediction.

Monitoring

Industrial sensors, medical signals, and infrastructure streams are judged by change through time, not isolated points.

Economics

Macroeconomic indicators, volatility, and interventions require dynamic rather than static reasoning.

Science

Measurements from astronomy, neuroscience, and climate science are often generated by latent dynamical systems.

Language and events

Counts, arrivals, clicks, and token streams carry temporal dependence that breaks i.i.d. assumptions.

Uncertainty

A useful forecast is not only a point estimate but also a calibrated picture of what may happen next.

Section 1 • Ordered dependence before modeling

What a Time Series Is

A time series is not only a list of values. Formally, one usually begins with a stochastic process $\{X_t : t \in \mathcal{T}\}$ indexed by time, and then one observes one realization $x_1,\dots,x_T$ of that process. The process is the probabilistic object; the observed sequence is a sample path.

DefinitionBox

Process versus realization

In discrete time, $\mathcal{T}$ is often $\mathbb{Z}$ or $\{1,\dots,T\}$. The variables $X_t$ are random, while the observed values $x_t$ are fixed once the sample has been recorded. Many statistical statements are about the process even though only one realization is available.

ExampleBox

Univariate versus multivariate

Daily temperature is a univariate series. Electricity demand, temperature, and humidity observed together form a multivariate series $Y_t \in \mathbb{R}^m$.

RemarkBox

Discrete and continuous time

This page focuses mainly on discrete-time models. Continuous-time processes require different tools and are not simply the same formulas with smaller time steps.

KeyIdeaBox

Why i.i.d. intuition fails

In a time series, nearby observations may carry information about each other. Shuffling the data destroys precisely the dependence structure that most forecasting, filtering, and signal-extraction methods are trying to learn.

Section 2 • Level, trend, seasonality, cycles, shocks

Core Structure of Time Series

A useful descriptive decomposition separates persistent structure from residual variation, but the parts should not be confused. Trend, seasonality, cycles, calendar effects, interventions, and noise describe different mechanisms.

DefinitionBox

Level and trend

Level is the local baseline. Trend is a smoother long-run change in the mean, which may be deterministic or stochastic.

DefinitionBox

Seasonality and cycles

Seasonality repeats with a known or fixed period, while cycles are recurrent but not locked to a single calendar frequency.

DefinitionBox

Shocks and interventions

Holidays, policy changes, outages, and structural breaks are not interchangeable with trend or seasonality; they often need explicit intervention terms or regime-sensitive modeling.

RemarkBox

Additive and multiplicative viewpoints

Additive thinking uses $y_t = \ell_t + s_t + r_t$ when the scale of fluctuations is roughly constant. Multiplicative thinking uses $y_t = \ell_t s_t r_t$ when variability grows with the level; logs often convert multiplicative structure into an additive one.

InteractiveWidgetShell

Decomposition explorer

Section 3 • Covariance structure and stationarity

Stochastic Foundations

The classical time-series vocabulary is built from moments indexed by lag. The lag operator $B X_t = X_{t-1}$ helps encode models compactly, while covariance functions explain what “dependence across time” means in second-order terms.

FormulaBlock

Moments and lagged dependence

$$ \mu_t = \mathbb{E}[X_t], \qquad \gamma_t(h) = \mathrm{Cov}(X_t, X_{t+h}), \qquad \rho_t(h) = \frac{\gamma_t(h)}{\sqrt{\gamma_t(0)\gamma_{t+h}(0)}}. $$

Weak stationarity asks for $\mu_t \equiv \mu$ and $\gamma_t(h) = \gamma(h)$ independent of $t$. Strict stationarity is stronger: the joint distribution of $(X_{t_1},\dots,X_{t_k})$ is invariant under time shifts.

DefinitionBox

White noise

A white-noise sequence usually means zero mean, constant variance, and zero autocovariance for all nonzero lags. It need not be independent unless extra assumptions are added.

RemarkBox

i.i.d. noise is stronger

Independent and identically distributed noise is white noise, but the converse can fail. Gaussian white noise is a special case where uncorrelatedness and independence coincide.

More on weak stationarity, strict stationarity, and ergodic intuition

Weak stationarity is a second-order statement, so it is enough for covariance-based tools such as ACF, spectral density, linear prediction, and many classical ARMA results. Strict stationarity controls full joint distributions and is therefore stronger. Ergodic intuition explains why averages computed along one long realization can sometimes approximate ensemble averages, but stationarity alone does not automatically guarantee every ergodic property.

Section 4 • Diagnostics by lag

Dependence, ACF, and PACF

The autocorrelation function summarizes linear dependence between values $h$ periods apart. The partial autocorrelation at lag $h$ asks what remains after linearly accounting for the shorter lags in between.

FormulaBlock

ACF

$$ \rho(h) = \frac{\gamma(h)}{\gamma(0)} $$

for a weakly stationary process. Sample autocorrelations estimate these lagged correlations from one realization.

DefinitionBox

PACF

The lag-$h$ PACF is the correlation between $X_t$ and $X_{t-h}$ after linearly removing the effect of $X_{t-1},\dots,X_{t-h+1}$. It is often estimated through regression or Levinson-Durbin recursions.

RemarkBox

Use the cutoff story carefully

The textbook heuristic “AR gives PACF cutoff, MA gives ACF cutoff” is a useful first guide for idealized low-order models, not a law of nature. Finite samples, mixed models, seasonal terms, and transformations all blur the picture.

InteractiveWidgetShell

ACF / PACF simulator

Section 5 • What transformations change, and what they do not

Stationarity, Differencing, and Transformations

Stationarity matters because many models describe dependence through constant lag structure. But not every nonstationarity should be attacked the same way: variance stabilization, detrending, differencing, and seasonal adjustment are distinct operations.

DefinitionBox

Deterministic trend

A smooth mean function $m_t$ can be removed by regression or decomposition when the trend is modeled as a deterministic component.

DefinitionBox

Stochastic trend

A unit-root process such as a random walk carries nonstationarity in the accumulation of shocks; differencing addresses a different mechanism from simple detrending.

FormulaBlock

Differencing operators

$$ \nabla X_t = (1-B)X_t, \qquad \nabla_s X_t = (1-B^s)X_t. $$

RemarkBox

Do not conflate detrending, differencing, and seasonal adjustment

Detrending removes an estimated smooth mean. Differencing removes persistent low-frequency behavior by applying lag operators. Seasonal adjustment targets periodic structure. A log or Box-Cox style transform instead acts on scale, not on temporal dependence directly.

InteractiveWidgetShell

Differencing and stationarity demo

Section 6 • Before the heavy models

Classical Baselines and Task Distinctions

Strong time-series practice begins with baselines and with clean task definitions. Forecasting, filtering, smoothing, and imputation are related but not identical.

ExampleBox

Forecasting

Predict future values $X_{T+h}$ using information available up to time $T$.

ExampleBox

Filtering and smoothing

Estimate latent states from noisy data, either using past data only or the entire sample.

ExampleBox

Imputation

Fill missing observations while respecting temporal structure and uncertainty.

DefinitionBox

Baseline models

White noise, mean forecast, naive forecast, seasonal naive forecast, and drift forecast are not trivialities. They tell us whether a more elaborate model is actually learning structure beyond persistence and season repetition.

Section 7 • Linear stochastic models

AR, MA, and ARMA Models

Autoregressive and moving-average models describe dependence through lag polynomials. They are simple enough to analyze and rich enough to explain many practical diagnostics.

FormulaBlock

AR($p$)

$$ X_t = c + \phi_1 X_{t-1} + \cdots + \phi_p X_{t-p} + \varepsilon_t. $$

Stationarity is tied to the roots of $1-\phi_1 z-\cdots-\phi_p z^p$: they must lie outside the unit circle.

FormulaBlock

MA($q$)

$$ X_t = \mu + \varepsilon_t + \theta_1 \varepsilon_{t-1} + \cdots + \theta_q \varepsilon_{t-q}. $$

Invertibility concerns whether the model can be uniquely represented in terms of past observations. The roots of $1+\theta_1 z+\cdots+\theta_q z^q$ must lie outside the unit circle.

RemarkBox

Stationarity and invertibility are not the same condition

AR stationarity controls whether the process is well-behaved in time. MA invertibility controls whether past shocks can be recovered from the observed process in a stable way. They solve different identifiability problems.

Why the root condition matters

The lag-polynomial view rewrites an AR process as $\phi(B)X_t = c + \varepsilon_t$. If $\phi(z)$ has roots on or inside the unit circle, the formal inverse $\phi(B)^{-1}$ does not generate an absolutely summable impulse response, so shocks do not decay in a stationary way. The MA invertibility condition is the analogous requirement for expressing the model as a stable infinite AR representation.

Section 8 • Building practical models from lag operators

ARIMA, SARIMA, and the Box–Jenkins Workflow

ARIMA extends ARMA by modeling the differenced series. Seasonal ARIMA adds seasonal differencing and seasonal lag polynomials. The “I” stands for integration in the time-series sense: undoing differencing to return to the original scale.

FormulaBlock

ARIMA and SARIMA notation

$$ \Phi(B^s)\phi(B)\nabla^d \nabla_s^D X_t = c + \Theta(B^s)\theta(B)\varepsilon_t. $$

Here $(p,d,q)$ describe nonseasonal AR, differencing, and MA orders, while $(P,D,Q)_s$ describe seasonal terms with period $s$.

KeyIdeaBox

Identification

Inspect the series, choose transformations, decide whether differencing is needed, then use residual diagnostics and likelihood-based comparison rather than ACF/PACF alone.

RemarkBox

Diagnostics

A fitted model should leave residuals behaving like white noise relative to the structure you intended to capture. Good in-sample fit is not enough if the residual dependence remains.

InteractiveWidgetShell

Forecasting playground

Section 9 • Signal extraction and local adaptation

Decomposition, Smoothing, and ETS

Smoothing is about extracting persistent structure from noisy observations. Exponential smoothing is not merely heuristic weighting; in many cases it has an explicit state-space interpretation.

DefinitionBox

Classical decomposition

Split a series into trend, seasonal, and remainder components, often with moving averages or smoother seasonal extraction.

DefinitionBox

STL-style decomposition

Seasonal-Trend decomposition using Loess is more flexible than classical fixed decomposition and handles changing seasonal structure more gracefully.

DefinitionBox

ETS

ETS summarizes models by Error, Trend, and Seasonality components. Many exponential-smoothing recursions arise as optimal filters for corresponding state-space models.

InteractiveWidgetShell

Smoothing and ETS demo

Section 10 • Latent states behind the observations

State-Space Models and Kalman Filtering

The state-space viewpoint treats the observed series as a noisy measurement of an evolving latent state. This unifies local-level models, structural time-series models, exponential smoothing, and many forecasting systems with missing data or time-varying covariates.

FormulaBlock

Linear Gaussian state space

$$ y_t = Z_t \alpha_t + \varepsilon_t, \qquad \alpha_{t+1} = T_t \alpha_t + R_t \eta_t, $$

with observation noise $\varepsilon_t$ and state noise $\eta_t$. Filtering estimates $\alpha_t$ using data up to time $t$; smoothing uses future data as well.

RemarkBox

Filtering, smoothing, forecasting

Filtering is online estimation of the current state. Smoothing retrospectively improves state estimates using the full sample. Forecasting pushes the state forward beyond the observed window.

KeyIdeaBox

Kalman update

The filter balances model confidence and observation noise: trust the data more when the measurement is reliable; trust the prior more when the observation is noisy.

InteractiveWidgetShell

State-space / Kalman intuition

Section 11 • Exogenous and endogenous structure

Regression with Exogenous Variables, VAR, and Cointegration

Time-series regression with exogenous variables is not the same as fully multivariate endogenous modeling. If one series predicts another, we still need to ask whether that predictor is treated as external or jointly generated inside the system.

DefinitionBox

Dynamic regression / ARIMAX

Regress on exogenous covariates while modeling the serial dependence of the errors, often with ARIMA-type error terms.

FormulaBlock

VAR

$$ Y_t = c + A_1 Y_{t-1} + \cdots + A_p Y_{t-p} + u_t. $$

Every component is modeled as depending on past values of all components.

DefinitionBox

Cointegration and VECM

If components are nonstationary but some linear combination is stationary, a VECM captures both short-run differences and long-run equilibrium correction.

RemarkBox

Dependence is not structural causality

Granger-style predictability is about incremental forecasting content, not a full structural causal claim. Useful dependence tests should not be oversold as mechanistic proof.

Section 12 • Changing variance and periodic structure

Volatility Models and the Frequency-Domain View

Not all time-series structure lives in the conditional mean. Financial returns often exhibit weak serial dependence in the mean but strong dependence in squared residuals. Separately, periodic behavior can be easier to diagnose in frequency space than in the time domain.

FormulaBlock

ARCH / GARCH intuition

$$ \varepsilon_t = \sigma_t z_t, \qquad \sigma_t^2 = \omega + \alpha \varepsilon_{t-1}^2 + \beta \sigma_{t-1}^2. $$

The conditional variance evolves through time, so shocks can cluster even if the mean process is simple.

FormulaBlock

Periodogram intuition

$$ I(\omega_k) = \frac{1}{T}\left|\sum_{t=1}^{T} x_t e^{-i \omega_k t}\right|^2. $$

Peaks in the periodogram point to dominant frequencies, complementing ACF-based seasonality diagnostics.

InteractiveWidgetShell

Time-domain / frequency-domain demo

Section 13 • Validation must respect time

Forecasting Workflow, Rolling Evaluation, and Probabilistic Uncertainty

Out-of-sample evaluation in time series must preserve chronology. Random shuffling breaks the forecasting problem by leaking future information into the training set.

DefinitionBox

Point forecasts versus distributional forecasts

A point forecast gives a central value such as the conditional mean or median. A probabilistic forecast describes a full predictive distribution or at least interval forecasts at stated coverage levels.

FormulaBlock

Prediction intervals

$$ \hat{y}_{T+h|T} \pm c \,\hat{\sigma}_h $$

for an appropriate multiplier $c$ under a chosen predictive distributional assumption.

RemarkBox

Calibration and sharpness

Good probabilistic forecasts are calibrated, so coverage is honest, and sharp, so the intervals are not unnecessarily wide. One without the other is not enough.

RemarkBox

Metrics

MAE and RMSE evaluate point forecasts. Interval coverage and width, pinball loss, or CRPS-style scores evaluate probabilistic quality more directly.

InteractiveWidgetShell

Rolling-origin evaluation demo

Section 14 • Other temporal tasks

Beyond Forecasting: Anomaly Detection, Classification, Clustering, Regression

Forecasting provides much of the theory, but time-series structure also matters for other supervised and unsupervised tasks.

ExampleBox

Anomaly detection

Deviations can be point anomalies, collective anomalies, or regime shifts. Temporal context matters: the same value can be normal in one phase and anomalous in another.

ExampleBox

Classification and regression

The target may be a label or scalar attached to a whole sequence or a subsequence. Feature extraction, shape-based comparison, or learned sequence encoders may all be appropriate.

ExampleBox

Clustering

Distances such as Euclidean or DTW, feature summaries, or learned embeddings define different notions of “similar behavior,” so the modeling goal has to be explicit.

ExampleBox

Monitoring and intervention

Sequential surveillance often mixes forecasting, residual control charts, changepoint detection, and intervention analysis rather than solving one isolated task.

Section 15 • Classical foundations first, then larger models

Modern Machine Learning and Deep Learning for Time Series

Modern methods do not erase the classical theory. They extend it with different inductive biases, larger function classes, and new scaling behavior, but issues such as leakage, nonstationarity, uncertainty, and evaluation remain.

DefinitionBox

Feature-based ML

Lagged features, calendar covariates, rolling summaries, and exogenous regressors often make linear models, trees, and boosted ensembles competitive baselines.

DefinitionBox

Sequence models

RNNs, LSTMs, GRUs, and TCNs encode different locality and memory biases for sequential data, especially with nonlinear dynamics or large multi-series datasets.

DefinitionBox

Transformers and foundation-style models

Attention-based models can capture long-range interactions and large covariate sets, but they do not automatically dominate simpler baselines, especially on smaller or cleaner forecasting tasks.

RemarkBox

Honest comparison matters

Strong recent benchmarks repeatedly show that classical models, good baselines, and careful data preprocessing remain hard to beat. Large architectures are most compelling when the data regime, covariate richness, or cross-series scale actually justify them.

Section 16 • Short corrections

Common Misconceptions

A few recurring mistakes create a large fraction of bad time-series modeling decisions.

RemarkBox

“Time series is just regression with timestamps.”

Temporal dependence changes both the model class and the validation protocol. Order is part of the problem, not a feature column to append at the end.

RemarkBox

“Differencing fixes every nonstationarity.”

Differencing can be essential for stochastic trends but can also over-difference, amplify noise, and destroy level information if applied mechanically.

RemarkBox

“Higher in-sample fit means better forecasts.”

Overfitting can improve in-sample likelihood while harming multi-step predictive performance and calibration out of sample.

RemarkBox

“White noise and i.i.d. are the same.”

White noise is an uncorrelated second-order condition; independence is stronger.

RemarkBox

“Transformers always beat classical models.”

Performance depends on data regime, horizon, covariates, evaluation design, and the strength of the baseline. Bigger is not automatically better.

RemarkBox

“All nonstationarity should be removed.”

Some models represent nonstationary components directly. Removing every visible trend can erase interpretable structure that the model should actually learn.

Section 17 • Turning theory into workflow

Practical Modeling Checklist

Good time-series work is often less about finding one magical architecture and more about respecting a careful sequence of questions.

RemarkBox

1. Inspect sampling and missingness

Check regularity, gaps, duplicated timestamps, aggregation choices, and calendar alignment.

RemarkBox

2. Visualize structure first

Look for level shifts, trend, seasonality, interventions, variance changes, and obvious anomalies.

RemarkBox

3. Build strong baselines

Naive, seasonal naive, drift, and simple smoothing are often surprisingly hard to beat.

RemarkBox

4. Diagnose dependence

Use ACF/PACF, residual plots, and domain knowledge instead of relying on one formal test alone.

RemarkBox

5. Match the model to the task

Univariate forecast, exogenous regression, multivariate system, latent-state model, or anomaly monitor?

RemarkBox

6. Validate chronologically

Use rolling-origin or walk-forward designs. Never let future data leak into the training window.

RemarkBox

7. Evaluate uncertainty

Prediction intervals and forecast distributions should be checked for calibration as well as width.

RemarkBox

8. Monitor drift and regime change

Models that worked last quarter may fail after a structural break, policy change, or sensor shift.

Section 18 • Where the theory becomes practical

Applications

The same backbone of dependence, uncertainty, and structure shows up across very different domains.

ExampleBox

Economics and macro indicators

Growth, inflation, unemployment, and interventions often combine trend, seasonal adjustment, and multivariate dynamics.

ExampleBox

Energy and climate

Demand responds to weather, holidays, and long-run structural change, so exogenous covariates and seasonal effects matter.

ExampleBox

Traffic and operations

Queueing, travel times, and service demand often need multiple horizons, online updates, and intervention awareness.

ExampleBox

Finance

Mean prediction is often modest, but volatility clustering and risk quantification are central.

ExampleBox

Healthcare and physiology

Vital signs and biosignals require noise-robust filtering, anomaly detection, and often irregularly sampled data handling.

ExampleBox

Event streams and language counts

Traffic, clicks, arrivals, and token counts behave like sequences with bursts, seasonality, and intervention effects.

Section 19 • Closing map

Takeaways and Further Reading

Time series require dependence-aware thinking. Forecasting is only one task, but it supplies a large share of the core mathematics: lag structure, stationarity, uncertainty, state estimation, and chronological validation.

KeyIdeaBox

Summary

Classical models still matter because they express interpretable dependence structures, provide strong baselines, and often scale well to uncertainty-aware forecasting. State-space thinking unifies many apparently different tools. Modern deep models are most useful when they are compared honestly against those foundations rather than replacing them rhetorically.

FurtherReadingBox

Curated reading
Spectral methods

Use the NIST and STAT 510 materials to connect periodogram intuition with formal spectral analysis.

Modern deep learning

For a broad transformer-oriented survey, see Transformers in Time Series, but read it after the classical foundations on this page.