Project Overview: Predictive Maintenance for Material Handlers¶
Objective: To develop a robust, scalable anomaly detection framework for fleet-wide material handler assets using high-resolution sensor data.
Methodology: Addressed non-constant temporal resolution and transmission gaps by implementing a Session-based segmentation strategy. This ensures temporal consistency within LSTM windows and ensures that the model does not interprete system-off periods as physical anomalies.
Model: Employed an LSTM Autoencoder (Deep Learning) to learn the temporal "fingerprint" of normal machine operations.
Detection: Used Reconstruction Error as the anomaly metric. A tunable threshold based on error percentiles allows for adjustable sensitivity (Strictness vs. False Alarm rate).
Key Results: The model successfully identifies deviations in Battery Voltage that correlate with potential electrical subsystem degradation.
Scalability Outlook: The architecture has the potential to be scaled to Multivariate inputs (combining multiple signals) and potentially be used as benchmark for Multi-asset fleets via Transfer Learning, providing a path toward prescriptive maintenance and automated failure diagnosis.
## imports section
#basic imports
import pandas as pd
import numpy as np
from datetime import datetime
# plotting tools
import seaborn as sns
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import matplotlib.pyplot as plt
import matplotlib.cm as cm
#modeling tools
from scipy.stats import zscore
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, RepeatVector, TimeDistributed, Dropout
from tensorflow.keras.callbacks import EarlyStopping
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import IsolationForest
#warning suppression
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline
## loading the original data
# raw_data = pd.read_parquet("https://drive.usercontent.google.com/download?id=1qEUpOJuKhyf7AcXTafmDKcO1zLvBMeU-&export=download&authuser=0")
# raw_data.head()
raw_data = pd.read_parquet('anamoly_data.parquet')
# here, i have intentionally, downloaded the original data from the drive location
# and using from local source, instead of downloading from the drive location frequently.
raw_data["timestamp"] = pd.to_datetime(raw_data["timestamp"])
raw_data.head()
| asset_id | timestamp | signal | value | |
|---|---|---|---|---|
| 0 | machine_1 | 2024-01-31 12:58:45 | fuelconsumption.idle.liter | 1838.700 |
| 1 | machine_1 | 2024-01-31 12:58:45 | fuelconsumption.load.liter | 39793.300 |
| 2 | machine_1 | 2024-01-31 12:58:45 | value.batteryvoltage.voltage | 27.944 |
| 3 | machine_1 | 2024-01-31 12:58:45 | value.chargingvoltage.voltage | 28.126 |
| 4 | machine_1 | 2024-01-31 12:58:45 | value.common.engine.fuel.used.total | 41632.000 |
Task1: Data analysis¶
observations-1
The parquet file appears to contain data dump of multiple assets and multiple signals. Individual signals can be sorted/fetched using dataframe splicing and filtering.
raw_data.groupby(["asset_id", "signal"])["value"].describe()
| count | mean | std | min | 25% | 50% | 75% | max | ||
|---|---|---|---|---|---|---|---|---|---|
| asset_id | signal | ||||||||
| machine_1 | fuelconsumption.idle.liter | 115197.0 | 2640.475545 | 461.626927 | 1838.700 | 2284.200 | 2652.300 | 2926.200 | 3637.300 |
| fuelconsumption.load.liter | 115197.0 | 62623.986718 | 12757.014918 | 39793.300 | 52495.000 | 61900.800 | 72749.500 | 87263.800 | |
| value.batteryvoltage.voltage | 115200.0 | 27.782047 | 0.736786 | 13.598 | 27.852 | 27.953 | 28.045 | 28.608 | |
| value.chargingvoltage.voltage | 115200.0 | 26.476483 | 6.596665 | 0.005 | 28.126 | 28.224 | 28.330 | 29.760 | |
| value.common.engine.fuel.used.total | 115201.0 | 65263.740383 | 13215.981707 | 41632.000 | 54778.000 | 64552.900 | 75675.700 | 90901.100 | |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| machine_9 | value.dieselengineactualspeed.rotation | 67886.0 | 1292.972763 | 803.895574 | 0.000 | 896.000 | 1824.000 | 1969.000 | 2120.000 |
| value.electriccurrent.current | 67886.0 | 0.000000 | 0.000000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | |
| value.enginesuperchargedair.temperature | 67886.0 | 37.828742 | 13.384035 | 4.000 | 29.000 | 37.000 | 45.000 | 82.000 | |
| value.hydraulicoil.temperature | 67886.0 | 41.304687 | 9.788502 | -20.000 | 37.000 | 43.000 | 48.000 | 65.000 | |
| value.hydraulicoilfanactualspeed.rotation | 67886.0 | 504.978965 | 364.411841 | 0.000 | 213.000 | 577.000 | 801.000 | 1699.000 |
130 rows × 8 columns
raw_data.groupby('asset_id').count()
| timestamp | signal | value | |
|---|---|---|---|
| asset_id | |||
| machine_1 | 1497615 | 1497615 | 1497615 |
| machine_10 | 916461 | 916461 | 916461 |
| machine_2 | 2279776 | 2279776 | 2279776 |
| machine_3 | 914323 | 914323 | 914323 |
| machine_4 | 1400329 | 1400329 | 1400329 |
| machine_5 | 1417004 | 1417004 | 1417004 |
| machine_6 | 2493067 | 2493067 | 2493067 |
| machine_7 | 1651440 | 1651440 | 1651440 |
| machine_8 | 1022788 | 1022788 | 1022788 |
| machine_9 | 882506 | 882506 | 882506 |
observation : data contains 10 machines info
raw_data.groupby(["asset_id", "signal"])["value"].count()
asset_id signal
machine_1 fuelconsumption.idle.liter 115197
fuelconsumption.load.liter 115197
value.batteryvoltage.voltage 115200
value.chargingvoltage.voltage 115200
value.common.engine.fuel.used.total 115201
...
machine_9 value.dieselengineactualspeed.rotation 67886
value.electriccurrent.current 67886
value.enginesuperchargedair.temperature 67886
value.hydraulicoil.temperature 67886
value.hydraulicoilfanactualspeed.rotation 67886
Name: value, Length: 130, dtype: int64
# number of data points per asset
raw_data.groupby(['asset_id'])['value'].count()
asset_id machine_1 1497615 machine_10 916461 machine_2 2279776 machine_3 914323 machine_4 1400329 machine_5 1417004 machine_6 2493067 machine_7 1651440 machine_8 1022788 machine_9 882506 Name: value, dtype: int64
# Count unique signals per asset
asset_signal_counts = raw_data.groupby("asset_id")["signal"].nunique().reset_index()
# Rename columns for clarity
asset_signal_counts.columns = ["asset_id", "num_signals"]
asset_signal_counts.head(10)
| asset_id | num_signals | |
|---|---|---|
| 0 | machine_1 | 13 |
| 1 | machine_10 | 13 |
| 2 | machine_2 | 13 |
| 3 | machine_3 | 13 |
| 4 | machine_4 | 13 |
| 5 | machine_5 | 13 |
| 6 | machine_6 | 13 |
| 7 | machine_7 | 13 |
| 8 | machine_8 | 13 |
| 9 | machine_9 | 13 |
machine_summary = pd.DataFrame(raw_data.groupby('asset_id')['signal'].unique()).reset_index()
# for asset in raw_data['asset_id'].unique():
# signals = raw_data.loc[raw_data['asset_id'] == asset, 'signal'].unique()
# print(f"Asset ID: {asset}")
# print(f"Signals: {signals}")
# print("-" * 40)
machine_summary.head(11)
| asset_id | signal | |
|---|---|---|
| 0 | machine_1 | [fuelconsumption.idle.liter, fuelconsumption.l... |
| 1 | machine_10 | [fuelconsumption.idle.liter, fuelconsumption.l... |
| 2 | machine_2 | [fuelconsumption.idle.liter, fuelconsumption.l... |
| 3 | machine_3 | [fuelconsumption.idle.liter, fuelconsumption.l... |
| 4 | machine_4 | [fuelconsumption.idle.liter, fuelconsumption.l... |
| 5 | machine_5 | [fuelconsumption.idle.liter, fuelconsumption.l... |
| 6 | machine_6 | [fuelconsumption.idle.liter, fuelconsumption.l... |
| 7 | machine_7 | [fuelconsumption.idle.liter, fuelconsumption.l... |
| 8 | machine_8 | [fuelconsumption.idle.liter, fuelconsumption.l... |
| 9 | machine_9 | [fuelconsumption.idle.liter, fuelconsumption.l... |
# Initialize the status column
machine_summary['signal_status'] = ""
# Outer loop: go through each machine
for i, row in machine_summary.iterrows():
machine_signals = set(row['signal']) # signals of current machine
asset = row['asset_id']
# Assume all signals are same until proven otherwise
status = "same"
# Inner loop: compare with machines that come after the current one
for j, other_row in machine_summary.iloc[i+1:].iterrows():
other_signals = set(other_row['signal'])
if other_signals != machine_signals:
status = "different"
break # no need to check further
# Assign the result
machine_summary.at[i, 'signal_status'] = status
machine_summary.head(11)
| asset_id | signal | signal_status | |
|---|---|---|---|
| 0 | machine_1 | [fuelconsumption.idle.liter, fuelconsumption.l... | same |
| 1 | machine_10 | [fuelconsumption.idle.liter, fuelconsumption.l... | same |
| 2 | machine_2 | [fuelconsumption.idle.liter, fuelconsumption.l... | same |
| 3 | machine_3 | [fuelconsumption.idle.liter, fuelconsumption.l... | same |
| 4 | machine_4 | [fuelconsumption.idle.liter, fuelconsumption.l... | same |
| 5 | machine_5 | [fuelconsumption.idle.liter, fuelconsumption.l... | same |
| 6 | machine_6 | [fuelconsumption.idle.liter, fuelconsumption.l... | same |
| 7 | machine_7 | [fuelconsumption.idle.liter, fuelconsumption.l... | same |
| 8 | machine_8 | [fuelconsumption.idle.liter, fuelconsumption.l... | same |
| 9 | machine_9 | [fuelconsumption.idle.liter, fuelconsumption.l... | same |
this algorithm compared the signals successively and ensured that all the machines have same signals.
# list of signals
raw_data['signal'].unique()
array(['fuelconsumption.idle.liter', 'fuelconsumption.load.liter',
'value.batteryvoltage.voltage', 'value.chargingvoltage.voltage',
'value.common.engine.fuel.used.total',
'value.common.machine.hours.operation.total',
'value.coolantsolution.temperature',
'value.coolantsolutionfanactualspeed.rotation',
'value.dieselengineactualspeed.rotation',
'value.electriccurrent.current',
'value.enginesuperchargedair.temperature',
'value.hydraulicoil.temperature',
'value.hydraulicoilfanactualspeed.rotation'], dtype=object)
columns = ["asset_id", "signal", "timestamp"]
anomaly_data = raw_data.sort_values(columns).set_index(columns)
anomaly_data.loc[(anomaly_data.index[0][0], "value.batteryvoltage.voltage")].plot(figsize=(16,6))
<Axes: xlabel='timestamp'>
Task 2: selecting the right signal for anamoly detection¶
defining signal ranking algorithm using Anomaly Score¶
- Here, I define the custom defination to access the anamoly of signal.
- the approach consists of 3 parameters, rate of anamolies, average z score of anamolies and maximum z score for each signal.
Where:¶
$N_{|Z|>3}$ = number of strong anomalies
$N$ = total number of samples
$\overline{|Z|}_{|Z|>3}$ = average anomaly magnitude for points where $|Z|>3$
$\max(|Z|)$ = strongest spike in the signal
Interpretation of Signal Score Components¶
| Component | Description |
|---|---|
| Anomaly Rate | Frequency of occurrences where $|Z| > 3$ |
| Anomaly Magnitude | Average strength of the detected anomalies |
| Peak Anomaly | Maximum observed anomaly intensity |
This approach would rank the Signals with frequent, strong, and extreme deviations the highest.
Here we use Z-score as a measure to assess each signal individually and select the signal with maximum z-score for anomaly detection.
The approach is to identify the signal with maximum anamolies. Here, contrary to the popular beliefs, of selecting the signal with maximum Z score as the signal (meaning, the signal with farthest anamoly point) i am trying to select the signal with maximum number of effective anamoly points.
Here, to achieve this objective, i first calculate the number of datapoints per signal with z scores > 3. (i have selected Z-score =3 as a threshold to denote a signal as anamoly. this threshold has beeen selected from thumb rules and prior usecases). i then calculate the average z score of all the anamoly points, and then multiply the count and average as a signal score.
I then use the anomaly score to reporesent the Anamoli-ness of the signal.
The advantage this offer is that it considers the smaller anamolies, which may be frequently appearing effectively as opposed to selecting the maximum z score (which could have been a one of event)
def compute_anomaly_score(values, z_threshold):
z = np.abs(zscore(values))
anomaly_rate = (np.abs(z) > z_threshold).mean()
if (np.abs(z) > z_threshold).sum() > 0:
mean_anomaly = np.abs(z)[np.abs(z) > z_threshold].mean()
else:
mean_anomaly = 0
max_z = z.max()
score = anomaly_rate * mean_anomaly * max_z
return score
def compute_anomaly_rate(values, z_threshold):
z = np.abs(zscore(values))
return (np.abs(z) > z_threshold).mean()
z_threshold = 3
# z threshold is considered as +/- 3, as its commonly used
# stastistic within which more than 99% of the data should reside.
anomaly_scores = raw_data.groupby(["asset_id", "signal"])["value"].agg(
anomaly_score = lambda x: compute_anomaly_score(x, z_threshold),
anomaly_rate = lambda x: compute_anomaly_rate(x, z_threshold)
).reset_index()
anomaly_scores.head()
| asset_id | signal | anomaly_score | anomaly_rate | |
|---|---|---|---|---|
| 0 | machine_1 | fuelconsumption.idle.liter | 0.000000 | 0.000000 |
| 1 | machine_1 | fuelconsumption.load.liter | 0.000000 | 0.000000 |
| 2 | machine_1 | value.batteryvoltage.voltage | 3.687401 | 0.048281 |
| 3 | machine_1 | value.chargingvoltage.voltage | 0.992641 | 0.065946 |
| 4 | machine_1 | value.common.engine.fuel.used.total | 0.000000 | 0.000000 |
anomaly_scores.shape[0]
130
anomaly_scores.info()
<class 'pandas.DataFrame'> RangeIndex: 130 entries, 0 to 129 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 asset_id 130 non-null str 1 signal 130 non-null str 2 anomaly_score 99 non-null float64 3 anomaly_rate 130 non-null float64 dtypes: float64(2), str(2) memory usage: 4.2 KB
there are NAN's introduced by our anomaly_score. lets investigate further on this.
anomaly_scores[anomaly_scores['anomaly_score'].isna()]
| asset_id | signal | anomaly_score | anomaly_rate | |
|---|---|---|---|---|
| 9 | machine_1 | value.electriccurrent.current | NaN | 0.0 |
| 13 | machine_10 | fuelconsumption.idle.liter | NaN | 0.0 |
| 14 | machine_10 | fuelconsumption.load.liter | NaN | 0.0 |
| 16 | machine_10 | value.chargingvoltage.voltage | NaN | 0.0 |
| 17 | machine_10 | value.common.engine.fuel.used.total | NaN | 0.0 |
| 19 | machine_10 | value.coolantsolution.temperature | NaN | 0.0 |
| 20 | machine_10 | value.coolantsolutionfanactualspeed.rotation | NaN | 0.0 |
| 21 | machine_10 | value.dieselengineactualspeed.rotation | NaN | 0.0 |
| 22 | machine_10 | value.electriccurrent.current | NaN | 0.0 |
| 23 | machine_10 | value.enginesuperchargedair.temperature | NaN | 0.0 |
| 39 | machine_3 | fuelconsumption.idle.liter | NaN | 0.0 |
| 40 | machine_3 | fuelconsumption.load.liter | NaN | 0.0 |
| 42 | machine_3 | value.chargingvoltage.voltage | NaN | 0.0 |
| 43 | machine_3 | value.common.engine.fuel.used.total | NaN | 0.0 |
| 45 | machine_3 | value.coolantsolution.temperature | NaN | 0.0 |
| 46 | machine_3 | value.coolantsolutionfanactualspeed.rotation | NaN | 0.0 |
| 47 | machine_3 | value.dieselengineactualspeed.rotation | NaN | 0.0 |
| 48 | machine_3 | value.electriccurrent.current | NaN | 0.0 |
| 49 | machine_3 | value.enginesuperchargedair.temperature | NaN | 0.0 |
| 91 | machine_7 | fuelconsumption.idle.liter | NaN | 0.0 |
| 92 | machine_7 | fuelconsumption.load.liter | NaN | 0.0 |
| 94 | machine_7 | value.chargingvoltage.voltage | NaN | 0.0 |
| 95 | machine_7 | value.common.engine.fuel.used.total | NaN | 0.0 |
| 97 | machine_7 | value.coolantsolution.temperature | NaN | 0.0 |
| 98 | machine_7 | value.coolantsolutionfanactualspeed.rotation | NaN | 0.0 |
| 99 | machine_7 | value.dieselengineactualspeed.rotation | NaN | 0.0 |
| 100 | machine_7 | value.electriccurrent.current | NaN | 0.0 |
| 101 | machine_7 | value.enginesuperchargedair.temperature | NaN | 0.0 |
| 103 | machine_7 | value.hydraulicoilfanactualspeed.rotation | NaN | 0.0 |
| 113 | machine_8 | value.electriccurrent.current | NaN | 0.0 |
| 126 | machine_9 | value.electriccurrent.current | NaN | 0.0 |
visualising some of these signals randomly.
raw_data[
(raw_data['asset_id'] == 'machine_1') &
(raw_data['signal'] == 'value.electriccurrent.current')
].plot(x='timestamp', y='value')
<Axes: xlabel='timestamp'>
raw_data[
(raw_data['asset_id'] == 'machine_7') &
(raw_data['signal'] == 'value.dieselengineactualspeed.rotation')
].plot(x='timestamp', y='value')
<Axes: xlabel='timestamp'>
from inspecting the signals with NAN as anomaly score, it is understood these signals do not contain any anomalies and in the formula since we used |Z| > 3 (sometimes, the numpy returs it as NAN), hence the anomaly score = NAN.
we can safely fill the NAN's with 0, for simplicity.
anomaly_scores.fillna(0, inplace=True)
| asset_id | signal | anomaly_score | anomaly_rate | |
|---|---|---|---|---|
| 0 | machine_1 | fuelconsumption.idle.liter | 0.000000 | 0.000000 |
| 1 | machine_1 | fuelconsumption.load.liter | 0.000000 | 0.000000 |
| 2 | machine_1 | value.batteryvoltage.voltage | 3.687401 | 0.048281 |
| 3 | machine_1 | value.chargingvoltage.voltage | 0.992641 | 0.065946 |
| 4 | machine_1 | value.common.engine.fuel.used.total | 0.000000 | 0.000000 |
| ... | ... | ... | ... | ... |
| 125 | machine_9 | value.dieselengineactualspeed.rotation | 0.000000 | 0.000000 |
| 126 | machine_9 | value.electriccurrent.current | 0.000000 | 0.000000 |
| 127 | machine_9 | value.enginesuperchargedair.temperature | 0.012556 | 0.001252 |
| 128 | machine_9 | value.hydraulicoil.temperature | 0.165434 | 0.007513 |
| 129 | machine_9 | value.hydraulicoilfanactualspeed.rotation | 0.000614 | 0.000059 |
130 rows × 4 columns
anomaly_scores.info()
<class 'pandas.DataFrame'> RangeIndex: 130 entries, 0 to 129 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 asset_id 130 non-null str 1 signal 130 non-null str 2 anomaly_score 130 non-null float64 3 anomaly_rate 130 non-null float64 dtypes: float64(2), str(2) memory usage: 4.2 KB
Signal Selection for Anomaly Detection¶
To select a suitable signal for the anomaly detection algorithm, we follow these principles:
Anomaly rate should be low: Ideally, the anomaly rate should be less than 3%. Signals with higher anomaly rates risk causing the model to classify anomalies as normal, since most anomaly detection algorithms assume that normal behavior dominates the data.
Maximize anomaly severity: Among signals that have an anomaly rate within the acceptable range (1–3%), we select the signal with the highest anomaly score. This ensures that the chosen signal exhibits clear and strong deviations, improving the effectiveness of the anomaly detection algorithm.
In short, we choose the signal with the strongest anomalies while maintaining a low enough anomaly rate to preserve the model’s ability to distinguish normal from abnormal behavior.
anomaly_scores[anomaly_scores['anomaly_rate'] < 0.03].sort_values(
by='anomaly_score', ascending=False
)
| asset_id | signal | anomaly_score | anomaly_rate | |
|---|---|---|---|---|
| 41 | machine_3 | value.batteryvoltage.voltage | 3.478893 | 0.013706 |
| 84 | machine_6 | value.coolantsolution.temperature | 2.197817 | 0.017369 |
| 54 | machine_4 | value.batteryvoltage.voltage | 2.085640 | 0.022031 |
| 32 | machine_2 | value.coolantsolution.temperature | 1.560607 | 0.011769 |
| 71 | machine_5 | value.coolantsolution.temperature | 1.501945 | 0.022678 |
| ... | ... | ... | ... | ... |
| 121 | machine_9 | value.common.engine.fuel.used.total | 0.000000 | 0.000000 |
| 125 | machine_9 | value.dieselengineactualspeed.rotation | 0.000000 | 0.000000 |
| 124 | machine_9 | value.coolantsolutionfanactualspeed.rotation | 0.000000 | 0.000000 |
| 122 | machine_9 | value.common.machine.hours.operation.total | 0.000000 | 0.000000 |
| 126 | machine_9 | value.electriccurrent.current | 0.000000 | 0.000000 |
121 rows × 4 columns
best signal according to our algorithm is 'batteryvoltage' signal from machine_3. lets visualise the same.
raw_data[
(raw_data['asset_id'] == 'machine_3') &
(raw_data['signal'] == 'value.batteryvoltage.voltage')
].plot(x='timestamp', y='value', figsize=(16,6), ylim = (0,35));
the data seems to contain lots of anomalies. Also, lets see the stastistics of the same signal in other assets and see if the problem is isolated to one machine or its across fleet.
anomaly_scores[anomaly_scores['signal'] == 'value.batteryvoltage.voltage']
| asset_id | signal | anomaly_score | anomaly_rate | |
|---|---|---|---|---|
| 2 | machine_1 | value.batteryvoltage.voltage | 3.687401 | 0.048281 |
| 15 | machine_10 | value.batteryvoltage.voltage | 0.954720 | 0.036654 |
| 28 | machine_2 | value.batteryvoltage.voltage | 5.453929 | 0.057509 |
| 41 | machine_3 | value.batteryvoltage.voltage | 3.478893 | 0.013706 |
| 54 | machine_4 | value.batteryvoltage.voltage | 2.085640 | 0.022031 |
| 67 | machine_5 | value.batteryvoltage.voltage | 0.800622 | 0.021717 |
| 80 | machine_6 | value.batteryvoltage.voltage | 0.500886 | 0.017114 |
| 93 | machine_7 | value.batteryvoltage.voltage | 0.000000 | 0.000000 |
| 106 | machine_8 | value.batteryvoltage.voltage | 0.749679 | 0.007207 |
| 119 | machine_9 | value.batteryvoltage.voltage | 0.248952 | 0.002313 |
it appears that the battery voltage signal contains anomalies across many assets.
Task 3: applying anomaly detection algorithms¶
step 1: preparing data¶
# ------------------------------
# 1. Selecting the signal
# ------------------------------
asset = 'machine_3'
signal = 'value.batteryvoltage.voltage'
# Filtering the data for this asset and signal
signal_data = raw_data[
(raw_data['asset_id'] == asset) &
(raw_data['signal'] == signal)
][['timestamp','value']]
signal_data.head()
| timestamp | value | |
|---|---|---|
| 3777393 | 2024-02-29 04:51:15 | 23.915 |
| 3777406 | 2024-02-29 04:52:18 | 23.940 |
| 3777419 | 2024-02-29 04:54:18 | 23.991 |
| 3777432 | 2024-02-29 04:56:18 | 23.722 |
| 3777445 | 2024-02-29 04:58:18 | 23.730 |
signal_data.describe()
| timestamp | value | |
|---|---|---|
| count | 70332 | 70332.000000 |
| mean | 2024-08-15 08:43:04.475871488 | 23.908168 |
| min | 2024-02-29 04:51:15 | 3.853000 |
| 25% | 2024-06-04 07:47:47 | 23.646000 |
| 50% | 2024-09-05 21:20:22.500000 | 23.890000 |
| 75% | 2024-10-31 09:38:13 | 24.125000 |
| max | 2024-12-31 05:36:37 | 28.381000 |
| std | NaN | 0.541947 |
# timeseries plotting - using plotly go for interactive analysis
# creating time series to perform timeseries EDA on the selected signal
signal_data['timestamp'] = pd.DatetimeIndex(signal_data ['timestamp'])
signal_data = signal_data.set_index('timestamp')
fig = go.Figure()
fig.add_trace(
go.Scatter(
x=signal_data.index,
y=signal_data['value'],
mode='lines+markers',
name=f'Signal: {signal}',
line=dict(color='royalblue'),
marker=dict(size=4)
)
)
# Add layout for better readability
fig.update_layout(
title=f"Interactive Signal Plot for Asset {asset}",
yaxis_title="Signal Value",
template="plotly_white",
hovermode="x unified"
)
# Show the interactive plot
fig.show()
Identifying the non uniform data points and visualising the differences.
delta = signal_data.index.diff().dropna()
delta_sec = delta.total_seconds()
delta_min = delta_sec / 60
print(delta_min)
print(delta_min.values)
fig = go.Figure()
fig.add_trace(
go.Scatter(
y=delta_min.values,
mode='lines+markers',
name=f'Signal: voltage signal_machine 3',
line=dict(color='royalblue'),
marker=dict(size=4)
)
)
# Add layout for better readability
fig.update_layout(
title=f"Interactive Signal Plot for Asset {asset}",
yaxis_title="Time delta (min)",
template="plotly_white",
hovermode="x unified"
)
# Show the interactive plot
fig.show()
Index([ 1.05, 2.0, 2.0,
2.0, 2.0, 2.0,
2.0166666666666666, 2.0, 2.0,
2.0,
...
1.8666666666666667, 0.0, 854.2166666666667,
1.65, 2.0, 2.0,
2.0, 2.0, 0.16666666666666666,
0.0],
dtype='float64', name='timestamp', length=70331)
[1.05 2. 2. ... 2. 0.16666667 0. ]
print(len(delta_min.to_series().unique()))
delta_min.to_series().unique()
plt.figure(figsize=(10,5))
plt.hist(delta_min, bins=50, edgecolor='k', log=True)
plt.title("Distribution of Measurement Intervals (delta_mins)")
plt.xlabel("Minutes between measurements")
plt.ylabel("Occurances")
plt.show()
634
from these plots,
- It is observed that the data has large occurances of 2 min interval(first bin) data and handful of bigger intervals.
- It would be better to split into multiple sessions, rather than treating it as one single event which would fully confuse the sequence based - ML models which depend on continuity.
signal_data.describe()
| value | |
|---|---|
| count | 70332.000000 |
| mean | 23.908168 |
| std | 0.541947 |
| min | 3.853000 |
| 25% | 23.646000 |
| 50% | 23.890000 |
| 75% | 24.125000 |
| max | 28.381000 |
data sequencing¶
Handling Discontinuities in the Time Series
The dataset contains several temporal discontinuities (downtime periods). If these gaps are not handled properly, the neural network may incorrectly learn patterns that are artifacts of missing time rather than real system behavior.
To address this, the time series is first segmented into independent sessions. A new session is defined whenever the time gap between two consecutive timestamps exceeds a predefined threshold.
In this analysis:
A downtime threshold of 10 minutes is used.
If the time difference between two samples is greater than 10 minutes, a new session is started.
Each session is then processed independently.
After session segmentation, the data within each session is converted into fixed-length sequences using a sliding window approach. These sequences are then used as inputs to the neural network.
This approach ensures that:
The model does not learn artificial patterns caused by downtime gaps.
Temporal relationships are preserved only within continuous operational periods.
Training data better represents real operational dynamics of the system.
A[Start] --> B{Calculate Time Differences (diffs)};
B --> C{Define THRESHOLD_MIN (e.g., 10 mins downtime)};
C --> D{Identify New Session Starts (diffs > THRESHOLD_MIN or isNa)};
D --> E[Assign session_id using cumsum()];
E --> F{Define SEQ_LENGTH (Window Size)};
F --> G[Create Sequences (Sliding Window within Sessions)];
G --> H[Perform Train-Test Split];
H --> I[End];
# --- 1. Configuration ---
THRESHOLD_MIN = 10 # Any gap > 10 mins starts a new session
SEQ_LENGTH = 30 # Number of 2-min intervals to look at (1 hour of data)
# --- 2. Create Session IDs ---
# We calculate the time difference between rows
diffs = signal_data.index.to_series().diff().dt.total_seconds() / 60
# --- 3. Identify where a new session starts (first row is always a new session start)
new_session_mask = (diffs > THRESHOLD_MIN) | (diffs.isna())
signal_data['session_id'] = new_session_mask.cumsum()
# --- 4. sequence generation
def create_sequences(df, window_size):
sequences = []
timestamps = []
for _, group in df.groupby('session_id'):
values = group['value'].values
index = group.index
if len(values) > window_size:
for i in range(len(values) - window_size):
sequences.append(values[i:i + window_size])
timestamps.append(index[i + window_size - 1])
X = np.expand_dims(np.array(sequences), axis=-1)
return X, np.array(timestamps)
# --- 5. train-test split
split_ratio = 0.8
split_index = int(len(signal_data) * split_ratio)
split_time = signal_data.index[split_index]
signal_data['dataset_split'] = 'train'
signal_data.loc[signal_data.index >= split_time, 'dataset_split'] = 'test'
train_df = signal_data[signal_data['dataset_split'] == 'train']
test_df = signal_data[signal_data['dataset_split'] == 'test']
X_train, ts_train = create_sequences(train_df, SEQ_LENGTH)
X_test, ts_test = create_sequences(test_df, SEQ_LENGTH)
print("Train shape:", X_train.shape)
print("Test shape:", X_test.shape)
Train shape: (48802, 30, 1) Test shape: (12405, 30, 1)
Note:
- To prevent temporal leakage, the dataset is split chronologically: the first 80% of observations are used for training and the most recent 20% are reserved for testing.
signal_data.head()
| value | session_id | dataset_split | |
|---|---|---|---|
| timestamp | |||
| 2024-02-29 04:51:15 | 23.915 | 1 | train |
| 2024-02-29 04:52:18 | 23.940 | 1 | train |
| 2024-02-29 04:54:18 | 23.991 | 1 | train |
| 2024-02-29 04:56:18 | 23.722 | 1 | train |
| 2024-02-29 04:58:18 | 23.730 | 1 | train |
signal_data['session_id'].nunique()
412
model selection reasoning:
- The LSTM Autoencoder was selected because it captures long-term temporal dependencies in sensor data. Unlike static models, it learns the expected 'shape' of a voltage cycle, making it sensitive to subtle degradation patterns rather than just point outliers.
# --- 1. Data Prep (Normalization the data since we are using Neural Nets) ---
scaler = StandardScaler()
# Reshape to 2D to scale, then back to 3D
X_train_flat = X_train.reshape(-1, X_train.shape[-1])
X_train_scaled = scaler.fit_transform(X_train_flat).reshape(X_train.shape)
# fit and transform on train set
X_test_flat = X_test.reshape(-1, X_test.shape[-1])
X_test_scaled = scaler.transform(X_test_flat).reshape(X_test.shape)
# only transform on test to avoide data leakage
# --- 2. Build the Model ---
model = Sequential([
# Encoder
LSTM(
64,
activation='tanh',
input_shape=(SEQ_LENGTH, 1),
return_sequences=False,
dropout=0.2,
recurrent_dropout=0.2
),
# Dropout(0.2),
# bridge
RepeatVector(SEQ_LENGTH),
#decoder
LSTM(
64,
activation='tanh',
return_sequences=True,
dropout=0.2,
recurrent_dropout=0.2
),
# Dropout(0.2),
#output
TimeDistributed(Dense(1))
]);
model.compile(optimizer='adam', loss='mse');
model.summary()
# --- 3. Training ---
early_stop = EarlyStopping(
monitor='val_loss',
patience=5,
restore_best_weights=True,
verbose=1
)
history = model.fit(X_train_scaled, X_train_scaled,
epochs=50,
batch_size=32,
validation_split=0.25,
callbacks =[early_stop],
verbose=1)
plt.figure(figsize=(10, 6))
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Model Convergence: Training vs Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Mean Squared Error (MSE)')
plt.legend()
plt.grid(True)
plt.show()
WARNING:tensorflow:TensorFlow GPU support is not available on native Windows for TensorFlow >= 2.11. Even if CUDA/cuDNN are installed, GPU will not be used. Please use WSL2 or the TensorFlow-DirectML plugin.
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ lstm (LSTM) │ (None, 64) │ 16,896 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ repeat_vector (RepeatVector) │ (None, 30, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ lstm_1 (LSTM) │ (None, 30, 64) │ 33,024 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ time_distributed │ (None, 30, 1) │ 65 │ │ (TimeDistributed) │ │ │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 49,985 (195.25 KB)
Trainable params: 49,985 (195.25 KB)
Non-trainable params: 0 (0.00 B)
Epoch 1/50 1144/1144 ━━━━━━━━━━━━━━━━━━━━ 32s 26ms/step - loss: 0.5117 - val_loss: 0.1948 Epoch 2/50 1144/1144 ━━━━━━━━━━━━━━━━━━━━ 40s 35ms/step - loss: 0.4291 - val_loss: 0.1772 Epoch 3/50 1144/1144 ━━━━━━━━━━━━━━━━━━━━ 39s 34ms/step - loss: 0.4019 - val_loss: 0.1694 Epoch 4/50 1144/1144 ━━━━━━━━━━━━━━━━━━━━ 34s 30ms/step - loss: 0.3907 - val_loss: 0.1490 Epoch 5/50 1144/1144 ━━━━━━━━━━━━━━━━━━━━ 44s 39ms/step - loss: 0.3694 - val_loss: 0.1475 Epoch 6/50 1144/1144 ━━━━━━━━━━━━━━━━━━━━ 35s 31ms/step - loss: 0.3408 - val_loss: 0.1549 Epoch 7/50 1144/1144 ━━━━━━━━━━━━━━━━━━━━ 41s 36ms/step - loss: 0.3388 - val_loss: 0.1350 Epoch 8/50 1144/1144 ━━━━━━━━━━━━━━━━━━━━ 42s 37ms/step - loss: 0.3298 - val_loss: 0.1390 Epoch 9/50 1144/1144 ━━━━━━━━━━━━━━━━━━━━ 37s 32ms/step - loss: 0.3200 - val_loss: 0.1335 Epoch 10/50 1144/1144 ━━━━━━━━━━━━━━━━━━━━ 45s 40ms/step - loss: 0.3061 - val_loss: 0.1264 Epoch 11/50 1144/1144 ━━━━━━━━━━━━━━━━━━━━ 37s 32ms/step - loss: 0.2945 - val_loss: 0.1381 Epoch 12/50 1144/1144 ━━━━━━━━━━━━━━━━━━━━ 45s 39ms/step - loss: 0.2997 - val_loss: 0.1268 Epoch 13/50 1144/1144 ━━━━━━━━━━━━━━━━━━━━ 37s 32ms/step - loss: 0.2849 - val_loss: 0.1228 Epoch 14/50 1144/1144 ━━━━━━━━━━━━━━━━━━━━ 47s 41ms/step - loss: 0.2883 - val_loss: 0.1193 Epoch 15/50 1144/1144 ━━━━━━━━━━━━━━━━━━━━ 39s 34ms/step - loss: 0.2791 - val_loss: 0.1290 Epoch 16/50 1144/1144 ━━━━━━━━━━━━━━━━━━━━ 46s 40ms/step - loss: 0.2707 - val_loss: 0.1323 Epoch 17/50 1144/1144 ━━━━━━━━━━━━━━━━━━━━ 38s 33ms/step - loss: 0.2755 - val_loss: 0.1199 Epoch 18/50 1144/1144 ━━━━━━━━━━━━━━━━━━━━ 46s 40ms/step - loss: 0.2725 - val_loss: 0.1154 Epoch 19/50 1144/1144 ━━━━━━━━━━━━━━━━━━━━ 36s 32ms/step - loss: 0.2720 - val_loss: 0.1314 Epoch 20/50 1144/1144 ━━━━━━━━━━━━━━━━━━━━ 46s 40ms/step - loss: 0.2689 - val_loss: 0.1242 Epoch 21/50 1144/1144 ━━━━━━━━━━━━━━━━━━━━ 38s 33ms/step - loss: 0.2641 - val_loss: 0.1189 Epoch 22/50 1144/1144 ━━━━━━━━━━━━━━━━━━━━ 48s 42ms/step - loss: 0.2634 - val_loss: 0.1176 Epoch 23/50 1144/1144 ━━━━━━━━━━━━━━━━━━━━ 41s 36ms/step - loss: 0.2559 - val_loss: 0.1213 Epoch 23: early stopping Restoring model weights from the end of the best epoch: 18.
# --- 1. Predict sequences ---
X_scaled = np.concatenate([X_train_scaled, X_test_scaled], axis=0)
ts_all = np.concatenate([ts_train, ts_test], axis=0)
X_pred = model.predict(X_scaled)
# --- 2. Compute reconstruction error per sequence ---
reconstruction_error_seq = np.mean(
np.square(X_scaled - X_pred),
axis=(1,2)
) # shape = (num_sequences,)
# --- 3. Determine anomaly threshold ---
threshold = np.percentile(reconstruction_error_seq, 95)
predicted_anomaly_seq = reconstruction_error_seq > threshold
print(f"Threshold for anomaly: {threshold}")
print(f"Number of anomalous sequences: {np.sum(predicted_anomaly_seq)}")
# --- 4. Add columns to signal_data directly ---
signal_data['reconstruction_error'] = 0.0
signal_data['predicted_anomaly'] = False
# Map sequence-level errors and anomalies to all points in each sequence (forward-fill)
for i, ts in enumerate(ts_all):
seq_mask = (signal_data.index <= ts)
seq_idx = signal_data[seq_mask].tail(SEQ_LENGTH).index
signal_data.loc[seq_idx, 'reconstruction_error'] = reconstruction_error_seq[i]
signal_data.loc[seq_idx, 'predicted_anomaly'] = predicted_anomaly_seq[i]
## Note: This mapping strategy propagates the sequence-level
# anomaly score back to individual timestamps.
## For production environments, this would be optimized via
# vectorized joins or a real-time inference buffer.
# --- 5. Plotting anomalies and train/test split ---
plt.figure(figsize=(16,5))
plt.plot(signal_data.index, signal_data['value'], label='Sensor Value')
plt.scatter(signal_data.index[signal_data['predicted_anomaly']],
signal_data['value'][signal_data['predicted_anomaly']],
color='red', s=20, label='Anomaly')
plt.axvline(
x=split_time,
color='orange',
linestyle='--',
linewidth=2,
label='Train/Test Split'
)
plt.title("Sensor Signal with Detected Anomalies")
plt.xlabel("Timestamp")
plt.ylabel("Value")
plt.legend()
plt.show()
1913/1913 ━━━━━━━━━━━━━━━━━━━━ 17s 8ms/step Threshold for anomaly: 0.3650253695243711 Number of anomalous sequences: 3061
The model is trained only on data that predominantly represents normal system behavior. Under this assumption, the LSTM autoencoder learns the normal temporal dynamics of the signal. Sequences that deviate from this learned pattern cannot be reconstructed accurately and therefore produce higher reconstruction error.
Reconstruction error is therefore used as an anomaly score.
The anomaly threshold is derived from the reconstruction error distribution of the training set. Since the training data is assumed to represent normal operation, large deviations from this distribution indicate abnormal behavior.
The threshold is set to the 95th percentile of the training reconstruction error. This ensures that only rare deviations from normal behavior are flagged as anomalies.
However, it should be noted that the threshold can be re-programmed.
mse = signal_data['reconstruction_error'].values
anomalies = signal_data['predicted_anomaly'].values
# Use the same threshold as before
threshold_value = np.percentile(mse, 95) # or use the threshold you computed earlier
plt.figure(figsize=(16,6))
plt.plot(mse, label="Reconstruction Error")
plt.axhline(
threshold_value,
color='red',
linestyle='--',
label='Threshold'
)
plt.scatter(
np.where(anomalies)[0],
mse[anomalies],
color='red',
label='Anomalies'
)
plt.title("LSTM Autoencoder Reconstruction Error (from signal_data)")
plt.xlabel("Index / Time Step")
plt.ylabel("Reconstruction Error")
plt.legend()
plt.show()
# representing sessions splitting info on the original signal info
sessions = signal_data['session_id'].unique()
num_sessions = len(sessions)
# Choose a colormap
colors = cm.get_cmap('tab20', num_sessions)
plt.figure(figsize=(16,6))
for i, sess in enumerate(sessions):
sess_df = signal_data[signal_data['session_id'] == sess]
plt.plot(sess_df.index, sess_df['value'], color=colors(i), label=f'Session {sess}')
# adding train/test split line
plt.axvline(
x=split_time,
color='orange',
linestyle='--',
linewidth=2,
label='Train/Test Split'
)
plt.title("Sensor Signal by Sessions")
plt.xlabel("Timestamp")
plt.ylabel("Sensor Value")
plt.tight_layout()
plt.show()
## threshold sensitivity study
thresholds = [90, 95, 97, 98, 99, 99.5, 99.9]
anomaly_counts = []
# Extract reconstruction error for train/test
train_errors = signal_data.loc[
signal_data['dataset_split'] == 'train',
'reconstruction_error'
]
test_errors = signal_data.loc[
signal_data['dataset_split'] == 'test',
'reconstruction_error'
]
for p in thresholds:
# Threshold calibrated only on training data
t = np.percentile(train_errors, p)
# Count anomalies in test data
count = np.sum(test_errors > t)
anomaly_counts.append(count)
# --- Plot ---
plt.figure(figsize=(10,5))
plt.plot(
thresholds,
anomaly_counts,
marker='o',
linestyle='--',
color='red'
)
plt.title("Threshold Sensitivity Analysis")
plt.xlabel("Strictness Percentile (Lower = More Alarms)")
plt.ylabel("Number of Detected Anomalies (Test Set)")
plt.grid(True)
plt.show()
## multivariate analysis of machine 3
machine_3_data = raw_data[raw_data["asset_id"] == "machine_3"]
machine_3_data.head()
| asset_id | timestamp | signal | value | |
|---|---|---|---|---|
| 3777391 | machine_3 | 2024-02-29 04:51:15 | fuelconsumption.idle.liter | 0.000 |
| 3777392 | machine_3 | 2024-02-29 04:51:15 | fuelconsumption.load.liter | 0.000 |
| 3777393 | machine_3 | 2024-02-29 04:51:15 | value.batteryvoltage.voltage | 23.915 |
| 3777394 | machine_3 | 2024-02-29 04:51:15 | value.chargingvoltage.voltage | 0.000 |
| 3777395 | machine_3 | 2024-02-29 04:51:15 | value.common.engine.fuel.used.total | 0.000 |
signals = machine_3_data["signal"].unique()
fig, axes = plt.subplots(len(signals), 1, figsize=(12, 10), sharex=True)
for i, signal in enumerate(signals):
signal_data = machine_3_data[machine_3_data["signal"] == signal]
axes[i].plot(signal_data["timestamp"], signal_data["value"])
axes[i].set_title(signal)
axes[i].set_ylabel("value")
axes[-1].set_xlabel("timestamp")
plt.suptitle("Machine 3 - Sensor Signals")
plt.tight_layout()
plt.show()
these plots indicate that for machine_3.... the selected battery voltage and "hydraulicoil temperature" and "hydraulicoilfanacutalspeed" can be highly correlated. while other signals do not show distinctive patterns. lets quantify the same using correlations plots.
selected_signals = [
"value.batteryvoltage.voltage",
"value.hydraulicoil.temperature",
"value.hydraulicoilfanactualspeed.rotation"
]
fig, axes = plt.subplots(len(selected_signals), 1, figsize=(16, 3 * len(selected_signals)), sharex=True)
for i, signal in enumerate(selected_signals):
signal_data = machine_3_data[machine_3_data["signal"] == signal]
axes[i].plot(signal_data["timestamp"], signal_data["value"], label=signal)
axes[i].set_title(signal)
axes[i].set_ylabel("Value")
axes[i].grid(True)
axes[-1].set_xlabel("Timestamp")
plt.suptitle("Machine 3 -prominent Sensor Signals", fontsize=16)
plt.subplots_adjust(top=0.92)
plt.xticks(rotation=45)
plt.show()
# Lets also extract the same correlation from the linear correlations between sensors plot.
machine_data = raw_data[raw_data['asset_id'] == 'machine_3']
machine_data_pivot = machine_data.pivot_table(
index='timestamp',
columns='signal',
values='value',
aggfunc='mean'
)
corr_matrix = machine_data_pivot.corr()
plt.figure(figsize=(10,8))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm')
plt.title("Correlation Heatmap for Machine Signals")
plt.show()
Statistical Observations and Physical Correlations (Machine 3)
- The exploratory data analysis and subsequent correlation study for Machine 3 reveal a distinct relationship between electrical performance and thermal operational states.
- Identification of Voltage AnomaliesUsing the custom Anomaly Score algorithm—which weighs anomaly frequency, magnitude, and peak intensity—the value.batteryvoltage.voltage signal for Machine 3 was identified as the most significant candidate for modeling. While the fleet-wide baseline for battery voltage remains relatively stable, Machine 3 exhibits localized "spikes" and "dips" that deviate from standard charging/discharging profiles.
- The Correlation: Voltage vs. Hydraulic OilThe cross-signal correlation analysis indicates a notable dependency between battery voltage fluctuations and Hydraulic Oil Temperature (value.hydraulicoil.temperature).
- Observation: Significant drops in battery voltage frequently coincide with rising trends or peak states in hydraulic oil temperature.
- Hypothesis: This linkage suggests that as the hydraulic system reaches high-load or high-temperature states, the increased demand on the machine's cooling fans (e.g., value.hydraulicoilfanactualspeed.rotation) or auxiliary electrical systems may be placing an intermittent strain on the electrical subsystem.
- Significance: This correlation transforms the voltage "noise" into a predictable operational pattern. It suggests that "abnormal" voltage behavior may actually be a leading indicator of thermal stress or mechanical load in the hydraulic system.
- Implications for Predictive Maintenance
- By establishing this link, the LSTM Autoencoder is not merely detecting random sensor glitches; it is learning to identify the "fingerprint" of the machine's energy-thermal balance. Deviations from this correlated relationship (e.g., a voltage drop occurring without a corresponding hydraulic load) can be flagged as a higher-priority anomaly, potentially indicating a failing alternator or a degrading battery cell before a total system failure occurs.
as expected, the battery voltage is correlated with hydraulicoil signals (altough smaller, but still present)
Task 3: Outlook on Scalability¶
To transition this proof-of-concept into a production-grade system, I suggest the following strategies for expansion:
- Multivariate Integration (Multi-Channel): I propose expanding the input tensor from a univariate shape $(30, 1)$ to a multivariate shape $(30, N)$. This would allow the model to learn cross-signal correlations and hidden physical dependencies. For example, the model could learn that high engine RPM coupled with zero fuel consumption is a physical impossibility, flagging it as an anomaly even if both signals appear "normal" in isolation.
- Asset-Specific Adaptation (Multiple Assets): For a fleet-wide rollout, I suggest a "Global-to-Local" training architecture. We can develop a Global Model trained on the aggregated data of all machines to capture baseline behavior. We can then utilize Transfer Learning to fine-tune specific layers for individual assets, effectively accounting for varying wear levels, environmental conditions, and age across the fleet.
- Federated Learning for Decentralized Scaling: As a key strategy for large-scale deployment, I suggest exploring Federated Learning. This approach would allow us to train and update models across the entire fleet of material handlers without the need to transmit massive volumes of raw, high-frequency sensor data to a central server. This not only significantly reduces bandwidth overhead but also ensures data privacy and security, as only the model weight updates are shared to improve the global intelligence of the system.
Task 4: Predictive Approach and Pattern Matching¶
To move the system from reactive detection to Prescriptive Maintenance, I propose a pattern-similarity framework:
- Anomaly Fingerprinting: When the LSTM Autoencoder identifies an anomaly, the specific 30-step window of reconstruction error is saved as a "Failure Fingerprint."
- Similarity Analysis via Dynamic Time Warping (DTW): I suggest comparing current anomaly fingerprints against a curated Library of Historical Failures. By using Dynamic Time Warping (DTW), we can calculate similarity scores that are robust to temporal shifts or variations in the speed of the failure's progression.
- Actionable Intelligence: This approach enables "predictability" by linking current patterns to past mitigations. If the current signal pattern matches a historical record of "Alternator Failure," the system can provide the specific mitigation strategy used previously (e.g., "Inspect belt tension") rather than just a generic alert. This shifts the focus from "what is happening" to "how to fix it."
## exploring other machines correlations for better understanding
machine_data = raw_data[raw_data['asset_id'] == 'machine_2']
machine_data_pivot = machine_data.pivot_table(
index='timestamp',
columns='signal',
values='value',
aggfunc='mean'
)
corr_matrix = machine_data_pivot.corr()
plt.figure(figsize=(10,8))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm')
plt.title("Correlation Heatmap for Machine Signals")
plt.show()
benchmark models - Statistical models using z score and Isolation forest models¶
# benching marking the anomaly detection algorithm using stastical approach of using Z score for anomaly detection
# ------------------------------
# 1. Select your signal
# ------------------------------
# Example: top signal per previous selection
asset = 'machine_3'
signal = 'value.batteryvoltage.voltage'
# Filter data for this asset and signal
signal_data = raw_data[
(raw_data['asset_id'] == asset) &
(raw_data['signal'] == signal)
][['timestamp','value']].copy()
# ------------------------------
# 2. Compute Z-score
# ------------------------------
signal_data['z_score'] = zscore(signal_data['value'])
# ------------------------------
# 3. Mark anomalies (|Z| > threshold)
# ------------------------------
z_threshold = 3
signal_data['anomaly'] = np.abs(signal_data['z_score']) > z_threshold
# ------------------------------
# 4. Plot the signal with anomalies
# ------------------------------
plt.figure(figsize=(16,6))
plt.plot(signal_data['timestamp'], signal_data['value'], label='Signal')
plt.scatter(
signal_data['timestamp'][signal_data['anomaly']],
signal_data['value'][signal_data['anomaly']],
color='red', label='Anomaly'
)
plt.title(f"Anomaly Detection for {asset} - {signal}")
plt.xlabel('Timestamp')
plt.ylabel('Value')
plt.legend()
plt.show()
# ------------------------------
# 5. Summary of anomalies
# ------------------------------
num_anomalies = signal_data['anomaly'].sum()
total_points = len(signal_data)
anomaly_rate = num_anomalies / total_points * 100
print(f"Signal: {signal}")
print(f"Asset: {asset}")
print(f"Total points: {total_points}")
print(f"Number of anomalies: {num_anomalies}")
print(f"Anomaly rate: {anomaly_rate:.2f}%")
Signal: value.batteryvoltage.voltage Asset: machine_3 Total points: 70332 Number of anomalies: 964 Anomaly rate: 1.37%
# ------------------------------
# 1. signal selection
# ------------------------------
asset = 'machine_3'
signal = 'value.batteryvoltage.voltage'
# Filter data for this asset and signal
signal_data = raw_data[
(raw_data['asset_id'] == asset) &
(raw_data['signal'] == signal)
][['timestamp','value']].copy()
# ------------------------------
# 2. Fit Isolation Forest
# ------------------------------
model = IsolationForest(
n_estimators=100, # number of trees
contamination='auto', # automatically estimate proportion of anomalies
random_state=42
)
signal_values = signal_data['value'].values.reshape(-1, 1)
model.fit(signal_values)
# ------------------------------
# 3. Predict anomalies
# ------------------------------
# -1 indicates anomaly, 1 indicates normal
signal_data['anomaly'] = model.predict(signal_values)
signal_data['anomaly'] = signal_data['anomaly'] == -1
# ------------------------------
# 4. Plot the signal with anomalies
# ------------------------------
plt.figure(figsize=(16,6))
plt.plot(signal_data['timestamp'], signal_data['value'], label='Signal')
plt.scatter(
signal_data['timestamp'][signal_data['anomaly']],
signal_data['value'][signal_data['anomaly']],
color='red', label='Anomaly'
)
plt.title(f"Isolation Forest Anomaly Detection for {asset} - {signal}")
plt.xlabel('Timestamp')
plt.ylabel('Value')
plt.legend()
plt.show()
# ------------------------------
# 5. Summary
# ------------------------------
num_anomalies = signal_data['anomaly'].sum()
total_points = len(signal_data)
anomaly_rate = num_anomalies / total_points * 100
print(f"Signal: {signal}")
print(f"Asset: {asset}")
print(f"Total points: {total_points}")
print(f"Number of anomalies: {num_anomalies}")
print(f"Anomaly rate: {anomaly_rate:.2f}%")
Signal: value.batteryvoltage.voltage Asset: machine_3 Total points: 70332 Number of anomalies: 5577 Anomaly rate: 7.93%