0

I have a table with 1799 users and 31 features which are arranged in rows and columns respectively. The last column is a 2-type condition feature that tells the model which condition the users belong to. I understood that by using LSTM I need to make my input to be 3-d. So, I used reshape(31,1) as I don't have time series data. I also understood that input_shape took in the number of features. My issue is that I want to predict a new set of users who also have the same 30 features and give me a classification result about which user belongs to which condition. It would be better if the result can tell me what is the probability of each of the conditions predicted. So, I tried to use model.predict to do the mentioned tasks. It gave me a result of a numpy array predict_prob with a shape=(200, 31, 1). I am confused at the part that the data structure should be [(31x1)x200] and the output should be the conditions of the users which should be (200,). How come the result is in 3-d and how should I convert it to dataframe format so that I can read it in .csv format? Thank you in advance.

X = raw_data[feature_names]
P = predict_data_raw[feature_names]
P1 = predict_data_raw[feature_names1]

#Training
y = raw_data['Conditions']
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=22, test_size=0.1)
X_test = np.expand_dims(X_test, axis=2)
# fit and evaluate a model
model = Sequential()
model.add(Reshape((31,1)))
model.add(Bidirectional(LSTM(10, return_sequences=True),input_shape=(31,)))
model.add(Dropout(0.5))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# compile the keras model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit the keras model on the dataset
LSTM = model.fit(X_train, y_train, epochs=5, batch_size=10)
# evaluate the keras model
_, accuracy = model.evaluate(X_test)
print('Accuracy: %.2f' % (accuracy*100))

predict_prob=model.predict([X_test])

df = pd.DataFrame(predict_prob,  columns=["Prediction"])
  • if you don't have timesteps (why use LSTM in that case?), your input should be (1,31) ([batch, timesteps, feature], https://keras.io/api/layers/recurrent_layers/lstm/). If last column from the 31 is your output classification, it should not be part of your inputs (only 30 columns, same no as the ones you want to predict). In your LSTM, you have kept return_sequences=True, which will give you output for all hidden states and Dense layer will accept it and maintain the shape, hence the 3d output – SajanGohil Sep 28 '22 at 10:24

0 Answers0