Time Series Modeling & Forecasting
What is a Time Series?
A time series is a sequence of observations collected over time, usually at regular intervals (daily sales, hourly temperature, monthly revenue, etc.).
Key properties:
- Trend: long‑term increase or decrease.
- Seasonality: repeated patterns (daily/weekly/yearly).
- Cyclic behavior: long‑term cycles not tied to a fixed calendar period.
- Noise: random fluctuations.
Basic Time Series Modeling
1. Train–Test Split for Time Series
Unlike regular ML, we cannot shuffle time series data.
We usually:
- Use the earlier part of the series for training.
- Reserve the most recent segment for validation/testing.
Sliding/expanding windows are common for more advanced evaluation.
2. Baseline Models
- Naive (last value) forecast: (\hat{y}_{t+1} = y_t).
- Moving average: predict the mean of the last (k) observations.
- Seasonal naive: use the value from the same season/period in the past.
These are simple but strong baselines to compare against.
Classical Forecasting Models
ARIMA (AutoRegressive Integrated Moving Average)
ARIMA models combine:
- AR (AutoRegressive): use past values of the series.
- I (Integrated): differencing to remove trend:
\[y_t' = y_t - y_{t-1}\]
- MA (Moving Average): use past forecast errors.
An ARIMA((p, d, q)) model has:
- (p): number of AR lags
- (d): order of differencing
- (q): number of MA terms
SARIMA
SARIMA extends ARIMA with seasonal components, good for data with strong seasonality (e.g., monthly demand).
Machine Learning & Deep Learning Approaches
When we have many features or complex patterns, we can use:
- Tree‑based models (Random Forest, Gradient Boosting) on lagged features (e.g., (y_{t-1}, y_{t-7}, y_{t-30}), calendar features).
- Recurrent Neural Networks (RNNs) and LSTMs/GRUs for sequential dependencies.
- Temporal Convolutional Networks and Transformers for long‑range patterns.
The idea is to turn the time series into supervised learning form:
- Input: window of past values ([y_{t-k+1}, \dots, y_t])
- Output: next value(s) ([y_{t+1}, \dots])
Anomaly Detection in Time Series
An anomaly (outlier) is an observation that does not conform to the expected pattern.
Common approaches:
- Forecasting + Residuals
- Train a forecasting model.
- Compute residuals (r_t = y_t - \hat{y}_t).
-
| Flag points where ( |
r_t |
) is much larger than usual (e.g., beyond 3 standard deviations). |
- Statistical thresholds
- Use rolling mean and standard deviation.
- Flag any point far from the rolling mean.
- Autoencoders / Deep models
- Train an autoencoder to reconstruct normal time series patterns.
- Large reconstruction error indicates a potential anomaly.
Simple Example in Python (ARIMA Forecasting)
import pandas as pd
from statsmodels.tsa.arima.model import ARIMA
# Example: univariate time series in a pandas Series
# index: DateTimeIndex, values: some measurement (e.g., daily sales)
ts = pd.read_csv("my_timeseries.csv", parse_dates=["date"], index_col="date")["value"]
# Train-test split (last 30 days as test)
train = ts.iloc[:-30]
test = ts.iloc[-30:]
# Fit ARIMA(p, d, q). This is just an example order.
model = ARIMA(train, order=(2, 1, 2))
model_fit = model.fit()
forecast = model_fit.forecast(steps=30)
print("Forecasted values:")
print(forecast)
Simple Example: Anomaly Detection with Residuals
import numpy as np
# Suppose 'residuals' is a pandas Series of y_t - y_hat_t
mean = residuals.mean()
std = residuals.std()
threshold = 3 * std # 3-sigma rule
anomalies = residuals[np.abs(residuals - mean) > threshold]
print("Anomalies detected at:")
print(anomalies.index)
These examples show how to build a basic ARIMA forecast and then use residuals to detect unusual points.