Time Series Modeling & Forecasting

What is a Time Series?

A time series is a sequence of observations collected over time, usually at regular intervals (daily sales, hourly temperature, monthly revenue, etc.).

Key properties:

  • Trend: long‑term increase or decrease.
  • Seasonality: repeated patterns (daily/weekly/yearly).
  • Cyclic behavior: long‑term cycles not tied to a fixed calendar period.
  • Noise: random fluctuations.

Basic Time Series Modeling

1. Train–Test Split for Time Series

Unlike regular ML, we cannot shuffle time series data.
We usually:

  • Use the earlier part of the series for training.
  • Reserve the most recent segment for validation/testing.

Sliding/expanding windows are common for more advanced evaluation.

2. Baseline Models

  • Naive (last value) forecast: (\hat{y}_{t+1} = y_t).
  • Moving average: predict the mean of the last (k) observations.
  • Seasonal naive: use the value from the same season/period in the past.

These are simple but strong baselines to compare against.


Classical Forecasting Models

ARIMA (AutoRegressive Integrated Moving Average)

ARIMA models combine:

  • AR (AutoRegressive): use past values of the series.
  • I (Integrated): differencing to remove trend:
\[y_t' = y_t - y_{t-1}\]
  • MA (Moving Average): use past forecast errors.

An ARIMA((p, d, q)) model has:

  • (p): number of AR lags
  • (d): order of differencing
  • (q): number of MA terms

SARIMA

SARIMA extends ARIMA with seasonal components, good for data with strong seasonality (e.g., monthly demand).


Machine Learning & Deep Learning Approaches

When we have many features or complex patterns, we can use:

  • Tree‑based models (Random Forest, Gradient Boosting) on lagged features (e.g., (y_{t-1}, y_{t-7}, y_{t-30}), calendar features).
  • Recurrent Neural Networks (RNNs) and LSTMs/GRUs for sequential dependencies.
  • Temporal Convolutional Networks and Transformers for long‑range patterns.

The idea is to turn the time series into supervised learning form:

  • Input: window of past values ([y_{t-k+1}, \dots, y_t])
  • Output: next value(s) ([y_{t+1}, \dots])

Anomaly Detection in Time Series

An anomaly (outlier) is an observation that does not conform to the expected pattern.

Common approaches:

  1. Forecasting + Residuals
    • Train a forecasting model.
    • Compute residuals (r_t = y_t - \hat{y}_t).
    • Flag points where ( r_t ) is much larger than usual (e.g., beyond 3 standard deviations).
  2. Statistical thresholds
    • Use rolling mean and standard deviation.
    • Flag any point far from the rolling mean.
  3. Autoencoders / Deep models
    • Train an autoencoder to reconstruct normal time series patterns.
    • Large reconstruction error indicates a potential anomaly.

Simple Example in Python (ARIMA Forecasting)

import pandas as pd
from statsmodels.tsa.arima.model import ARIMA

# Example: univariate time series in a pandas Series
# index: DateTimeIndex, values: some measurement (e.g., daily sales)
ts = pd.read_csv("my_timeseries.csv", parse_dates=["date"], index_col="date")["value"]

# Train-test split (last 30 days as test)
train = ts.iloc[:-30]
test = ts.iloc[-30:]

# Fit ARIMA(p, d, q). This is just an example order.
model = ARIMA(train, order=(2, 1, 2))
model_fit = model.fit()

forecast = model_fit.forecast(steps=30)

print("Forecasted values:")
print(forecast)

Simple Example: Anomaly Detection with Residuals

import numpy as np

# Suppose 'residuals' is a pandas Series of y_t - y_hat_t
mean = residuals.mean()
std = residuals.std()

threshold = 3 * std  # 3-sigma rule

anomalies = residuals[np.abs(residuals - mean) > threshold]

print("Anomalies detected at:")
print(anomalies.index)

These examples show how to build a basic ARIMA forecast and then use residuals to detect unusual points.