Anomaly Detection

What is Anomaly Detection?

Anomaly detection (also called outlier detection) is the task of identifying data points, events, or observations that deviate significantly from the expected pattern or behavior.

Common applications:

Predictive maintenance: detecting unusual vibration or temperature signatures before a machine fails.
Fraud detection: flagging unusual financial transactions.
Network security: identifying unusual traffic patterns (intrusion detection).
Quality control: spotting defective products on a production line.
Healthcare: detecting abnormal readings in patient vitals.

Types of Anomalies

Understanding the type of anomaly helps in choosing the right method.

Point anomaly: a single data point is far from the rest (e.g., a sudden spike in sensor reading).
Contextual anomaly: a value is anomalous only in a specific context (e.g., 30°C is normal in summer but unusual in winter).
Collective anomaly: a group of data points is anomalous together even if individual points look normal (e.g., a sequence of slightly elevated readings that together indicate a drift).

Statistical Methods

Z-Score / Standard Deviation Rule

The simplest approach: flag any point that is more than (k) standard deviations away from the mean.

\[z_i = \frac{x_i - \mu}{\sigma}\]

Points where (

z_i

> 3) (the 3-sigma rule) are typically flagged as anomalies.

Limitation: assumes data is normally distributed and stationary.

Rolling Statistics

For time series, use a rolling mean and rolling standard deviation to adapt to local behavior:

\[\text{anomaly if } |x_t - \mu_{\text{roll}}| > k \cdot \sigma_{\text{roll}}\]

This is more robust to trends and slow drifts.

Interquartile Range (IQR)

A robust alternative to z-scores:

\[\text{IQR} = Q_3 - Q_1\]

Flag points below (Q_1 - 1.5 \cdot \text{IQR}) or above (Q_3 + 1.5 \cdot \text{IQR}).

Less sensitive to extreme outliers skewing the mean.

Machine Learning Methods

Isolation Forest

Isolation Forest works by randomly partitioning the feature space using decision trees.

Key intuition: anomalies are easier to isolate (they require fewer splits) than normal points.

Build an ensemble of random isolation trees.
Compute an anomaly score for each point based on average path length to isolation.
Short path → easier to isolate → more likely anomalous.

Works well for high-dimensional data and is computationally efficient.

One-Class SVM

Trains a boundary around the normal data in feature space.
Points outside this boundary are flagged as anomalies.

Useful when you only have normal examples to train on (semi-supervised setting).

Local Outlier Factor (LOF)

LOF measures the local density of a point relative to its neighbors.

\[\text{LOF}(x) = \frac{\text{avg. local density of neighbors}}{\text{local density of } x}\]

LOF ≈ 1 → point is similar to its neighbors (normal).
LOF » 1 → point is in a sparser region than its neighbors (anomalous).

Good for detecting contextual and local anomalies that global methods miss.

Deep Learning Methods

Autoencoder-Based Detection

An autoencoder is trained to reconstruct normal data:

Encoder: compresses input into a lower-dimensional latent space.
Decoder: reconstructs the input from the latent representation.

Train only on normal data. At inference:

\[\text{reconstruction error} = \| x - \hat{x} \|^2\]

High reconstruction error → the model struggles to reconstruct the point → likely anomalous.

This works well for time series, images, and tabular data.

LSTM-Based Anomaly Detection

For sequential data, an LSTM (Long Short-Term Memory) network can be trained to predict the next value in a series.

\[r_t = y_t - \hat{y}_t\]

Large residuals (r_t) indicate anomalous time steps. This was briefly introduced in the Time Series chapter and is expanded here.

Evaluation

Anomaly detection is tricky to evaluate because anomalies are rare (class imbalance).

Avoid plain accuracy — a model that labels everything as normal can still achieve 99% accuracy if anomalies are 1% of data.

Preferred metrics:

Precision: of all flagged anomalies, how many are truly anomalous?
Recall: of all true anomalies, how many did we catch?
F1-Score: harmonic mean of precision and recall.
AUC-ROC: overall discriminative power across thresholds.
AUC-PR (Precision-Recall curve): preferred when anomalies are very rare.

Python Examples

Statistical: Rolling Z-Score

import pandas as pd
import numpy as np

# Suppose 'ts' is a pandas Series with a DateTimeIndex
def rolling_zscore_anomalies(ts, window=30, threshold=3.0):
    rolling_mean = ts.rolling(window=window).mean()
    rolling_std  = ts.rolling(window=window).std()
    z_scores = (ts - rolling_mean) / rolling_std
    anomalies = ts[np.abs(z_scores) > threshold]
    return anomalies

anomalies = rolling_zscore_anomalies(ts)
print("Anomalies detected:")
print(anomalies)

Isolation Forest

import numpy as np
from sklearn.ensemble import IsolationForest

# X: feature array of shape (n_samples, n_features)
# For univariate time series, reshape: X = ts.values.reshape(-1, 1)

clf = IsolationForest(
    n_estimators=100,
    contamination=0.05,   # expected fraction of anomalies
    random_state=42
)
clf.fit(X)

# Returns +1 (normal) or -1 (anomaly)
labels = clf.predict(X)
anomaly_indices = np.where(labels == -1)[0]

print(f"Detected {len(anomaly_indices)} anomalies.")

Autoencoder (Keras)

import numpy as np
from tensorflow import keras
from tensorflow.keras import layers

# Assume X_train contains only normal samples, shape: (n_samples, n_features)
input_dim = X_train.shape[1]
encoding_dim = 8  # bottleneck size

# Build autoencoder
inputs = keras.Input(shape=(input_dim,))
encoded = layers.Dense(32, activation="relu")(inputs)
encoded = layers.Dense(encoding_dim, activation="relu")(encoded)
decoded = layers.Dense(32, activation="relu")(encoded)
decoded = layers.Dense(input_dim, activation="linear")(decoded)

autoencoder = keras.Model(inputs, decoded)
autoencoder.compile(optimizer="adam", loss="mse")

autoencoder.fit(
    X_train, X_train,
    epochs=50,
    batch_size=32,
    validation_split=0.1,
    verbose=1
)

# Reconstruction error on new data
X_pred = autoencoder.predict(X_test)
reconstruction_errors = np.mean(np.square(X_test - X_pred), axis=1)

# Set threshold (e.g., 95th percentile of training errors)
X_train_pred = autoencoder.predict(X_train)
train_errors = np.mean(np.square(X_train - X_train_pred), axis=1)
threshold = np.percentile(train_errors, 95)

anomalies = np.where(reconstruction_errors > threshold)[0]
print(f"Detected {len(anomalies)} anomalies using autoencoder.")

Choosing the Right Method

Method	Best For	Notes
Z-Score / IQR	Simple univariate, stationary data	Fast, interpretable
Rolling statistics	Time series with slow drift	Adaptive to local behavior
Isolation Forest	High-dimensional tabular data	Robust, scalable
LOF	Local/contextual anomalies	Sensitive to neighborhood size
One-Class SVM	Small datasets, semi-supervised	Slow on large data
Autoencoder	Complex patterns, images, sequences	Needs sufficient normal data
LSTM residuals	Sequential/time series data	Best for temporal dependencies

Connection to Time Series

Anomaly detection and time series forecasting are closely linked.
Many production systems combine both:

Train a forecasting model (ARIMA, LSTM, etc.) on historical data.
Compute residuals between predicted and actual values.
Apply a statistical threshold or learned threshold on residuals to flag anomalies.

This forecast-then-detect pipeline is often the most interpretable approach in industrial settings such as predictive maintenance — directly relevant to sensor-based condition monitoring work.