Handling imbalanced time series data

Question

Having a time-series data of sensors:

+----+----------+----------+------+
|day |Feature 1 |Feature 2 |target|
+----+----------+----------+------+
|0   |0.2       |0.1       |0.01  |
+----+----------+----------+------+
|... until day 30

I've built an LSTM model that predict the target value of day 30 based on the first 7 days.

model = Sequential()
model.add(LSTM(32, activation='tanh', input_shape=(num_samples, num_features))),
model.add(Dense(32, activation='relu')),
model.add(Dense(1, activation='sigmoid'))

model.compile(loss='mse', optimizer="adam", metrics=['mae', 'mse'])

The model MSE is 0.05, but when looking at the data, I can see that in the majority of cases the target score of day 30 is between a specific range. So my model predicts most of the time correct and misses when there is an anomaly (which is what I'm trying to catch).

I've looked at techniques for handling unbalanced data with classification problems, like over-sampling, under-sampling and SMOTE. However, I couldn’t find anything regarding a time-series regression problem.

score 1 · Answer 1 · answered Jun 21 '20 at 19:47

I don't know anything about sensor data, but can you not impute missing data elements?

import numpy as np
from sklearn.impute import SimpleImputer
imp = SimpleImputer(missing_values=np.nan, strategy='mean')
imp.fit([[1, 2], [np.nan, 3], [7, 6]])
SimpleImputer()
X = [[np.nan, 2], [6, np.nan], [7, 6]]
print(X)
print(imp.transform(X))

Result:

[[nan, 2], [6, nan], [7, 6]]


[[4.         2.        ]
 [6.         3.66666667]
 [7.         6.        ]]

https://scikit-learn.org/stable/modules/generated/sklearn.impute.IterativeImputer.html

I guess I can, how can I use it to create a new time-series that looks like one of the minority? — Shlomi Schwartz, Jun 22 '20 at 06:40

Handling imbalanced time series data

1 Answers1