-1

Having a time-series data of sensors:

+----+----------+----------+------+
|day |Feature 1 |Feature 2 |target|
+----+----------+----------+------+
|0   |0.2       |0.1       |0.01  |
+----+----------+----------+------+
|... until day 30

I've built an LSTM model that predict the target value of day 30 based on the first 7 days.

model = Sequential()
model.add(LSTM(32, activation='tanh', input_shape=(num_samples, num_features))),
model.add(Dense(32, activation='relu')),
model.add(Dense(1, activation='sigmoid'))

model.compile(loss='mse', optimizer="adam", metrics=['mae', 'mse'])

The model MSE is 0.05, but when looking at the data, I can see that in the majority of cases the target score of day 30 is between a specific range. So my model predicts most of the time correct and misses when there is an anomaly (which is what I'm trying to catch).

I've looked at techniques for handling unbalanced data with classification problems, like over-sampling, under-sampling and SMOTE. However, I couldn’t find anything regarding a time-series regression problem.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Shlomi Schwartz
  • 8,693
  • 29
  • 109
  • 186

1 Answers1

1

I don't know anything about sensor data, but can you not impute missing data elements?

import numpy as np
from sklearn.impute import SimpleImputer
imp = SimpleImputer(missing_values=np.nan, strategy='mean')
imp.fit([[1, 2], [np.nan, 3], [7, 6]])
SimpleImputer()
X = [[np.nan, 2], [6, np.nan], [7, 6]]
print(X)
print(imp.transform(X))

Result:

[[nan, 2], [6, nan], [7, 6]]


[[4.         2.        ]
 [6.         3.66666667]
 [7.         6.        ]]

https://scikit-learn.org/stable/modules/generated/sklearn.impute.IterativeImputer.html

ASH
  • 20,759
  • 19
  • 87
  • 200