I try to apply a multivariate forecasting on a df like this:
I transformed the dataset to one-hot-encoding to feed it to an LSTM
one_hot_encoder = OneHotEncoder(handle_unknown='ignore')
results = one_hot_encoder.fit_transform(cal_automn)
df_results = pd.DataFrame.sparse.from_spmatrix(results)
df_results.columns = one_hot_encoder.get_feature_names(df.columns)
df_results
It seems to be ok then I try to preprocess the data to enter the LSTM.
def prep_data(df, window_size = 5):
df_as_np = df.to_numpy()
X = [] # x shape will be the number of training examples by the number of time steps we are using times the number of variables we are using
y = []
for i in range(len(df_as_np) - window_size): # iterate through the dataframe with an index
row = [r for r in df_as_np[i:i + window_size]]
X.append(row)
label = [df_as_np[i + window_size][0], df_as_np[i + window_size][1]] # we want to predict the whole next row "the next day"
y.append(label)
return np.array(X), np.array(y)
I have consecutive day inputs of a string and a float, specifically training my LSTM model I want to predict the next day of the data frame, an output like this
2021-12-1 Z 86400.0 (Z: the event type, 86400.0: the duration of the event type in seconds)
Then I split my data:
X2, y2 = prep_data(df_results) #2 the number of input values
X2.shape, y2.shape #((2452, 5, 291), (2452, 2))
X2_train, y2_train = X2[:2000], y2[:2000]
X2_val, y2_val = X2[2000:2100], y2[2000:2100]
X2_test, y2_test = X2[2100:2452], y2[2100:2452]
X2_train.shape, y2_train.shape, X2_val.shape, y2_val.shape, X2_test.shape, y2_test.shape
#((2000, 5, 291), (2000, 2), (100, 5, 291), (100, 2), (352, 5, 291), (352, 2))
I define, compile, fit, the model :
model1 = Sequential()
#model1.add(Masking(mask_value=" ", input_shape=(6, y2)))
#model1.add(LSTM(lstm_units))
#model1.add(InputLayer(input_shape=(max_sequence_length, num_chars)))
model1.add(InputLayer((5,291)))
model1.add(LSTM(64))
model1.add(Dense(8, 'relu'))
model1.add(Dense(2, 'sigmoid')) # outputs 2 things
model1.summary()
cp1 = ModelCheckpoint('model1/', save_best_only=True)
model1.compile(loss = 'categorical_crossentropy', optimizer=Adam(learning_rate=0.0001), metrics=['accuracy', 'RootMeanSquaredError']) # loss= 'binary_crossentropy'
model1.fit(X2_train, y2_train, validation_data=(X2_val, y2_val), epochs=10, callbacks=[cp1])
Tha accuracy is very bad.. however I continue..
And assuming the model has learned something I try to check it on the test set
model1.predict(X2_test.all(), y2_test.all())
However when I do that i get error
ValueError: Failed to find data adapter that can handle input: <class 'numpy.bool_'>, <class
model results
'NoneType'>
Further when I try to plot and add all the results by creating a df with the model results I get an empty df and plot. plot visualization
Do you have any idea on what is the problem, I understand it has to do with the hot-encoded initial df.. but not sure