Time Series Modeling & Forecasting

What is a Time Series?

A time series is a sequence of observations collected over time, usually at regular intervals (daily sales, hourly temperature, monthly revenue, etc.).

Key properties:

Trend: long‑term increase or decrease.
Seasonality: repeated patterns (daily/weekly/yearly).
Cyclic behavior: long‑term cycles not tied to a fixed calendar period.
Noise: random fluctuations.

Basic Time Series Modeling

1. Train–Test Split for Time Series

Unlike regular ML, we cannot shuffle time series data.
We usually:

Use the earlier part of the series for training.
Reserve the most recent segment for validation/testing.

Sliding/expanding windows are common for more advanced evaluation.

2. Baseline Models

Naive (last value) forecast: (\hat{y}_{t+1} = y_t).
Moving average: predict the mean of the last (k) observations.
Seasonal naive: use the value from the same season/period in the past.

These are simple but strong baselines to compare against.

Classical Forecasting Models

ARIMA (AutoRegressive Integrated Moving Average)

ARIMA models combine:

AR (AutoRegressive): use past values of the series.
I (Integrated): differencing to remove trend:

\[y_t' = y_t - y_{t-1}\]

MA (Moving Average): use past forecast errors.

An ARIMA((p, d, q)) model has:

(p): number of AR lags
(d): order of differencing
(q): number of MA terms

SARIMA

SARIMA extends ARIMA with seasonal components, good for data with strong seasonality (e.g., monthly demand).

Machine Learning & Deep Learning Approaches

When we have many features or complex patterns, we can use:

Tree‑based models (Random Forest, Gradient Boosting) on lagged features (e.g., (y_{t-1}, y_{t-7}, y_{t-30}), calendar features).
Recurrent Neural Networks (RNNs) and LSTMs/GRUs for sequential dependencies.
Temporal Convolutional Networks and Transformers for long‑range patterns.

The idea is to turn the time series into supervised learning form:

Input: window of past values ([y_{t-k+1}, \dots, y_t])
Output: next value(s) ([y_{t+1}, \dots])

Anomaly Detection in Time Series

An anomaly (outlier) is an observation that does not conform to the expected pattern.

Common approaches:

Forecasting + Residuals
- Train a forecasting model.
- Compute residuals (r_t = y_t - \hat{y}_t).
- Flag points where ( r_t ) is much larger than usual (e.g., beyond 3 standard deviations).
Statistical thresholds
- Use rolling mean and standard deviation.
- Flag any point far from the rolling mean.
Autoencoders / Deep models
- Train an autoencoder to reconstruct normal time series patterns.
- Large reconstruction error indicates a potential anomaly.

Simple Example in Python (ARIMA Forecasting)

import pandas as pd
from statsmodels.tsa.arima.model import ARIMA

# Example: univariate time series in a pandas Series
# index: DateTimeIndex, values: some measurement (e.g., daily sales)
ts = pd.read_csv("my_timeseries.csv", parse_dates=["date"], index_col="date")["value"]

# Train-test split (last 30 days as test)
train = ts.iloc[:-30]
test = ts.iloc[-30:]

# Fit ARIMA(p, d, q). This is just an example order.
model = ARIMA(train, order=(2, 1, 2))
model_fit = model.fit()

forecast = model_fit.forecast(steps=30)

print("Forecasted values:")
print(forecast)

Simple Example: Anomaly Detection with Residuals

import numpy as np

# Suppose 'residuals' is a pandas Series of y_t - y_hat_t
mean = residuals.mean()
std = residuals.std()

threshold = 3 * std  # 3-sigma rule

anomalies = residuals[np.abs(residuals - mean) > threshold]

print("Anomalies detected at:")
print(anomalies.index)

These examples show how to build a basic ARIMA forecast and then use residuals to detect unusual points.

Contents