My data is an dataframe of a table of 25 columns and 2737 rows containg binnary data.
The goal is to train using each row as an INPUT and get as an OUTPUT a probabilistic prediction of what the next sequence could be.
Data on this scenario is always 10 number 0 and 15 number 1, or 10 False and 15 True.
First 5 rows from my dataframe is:
[[0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 1 0 1 0 1 0 0 1 1 1]
[1 0 0 1 1 1 1 0 1 0 1 1 1 0 1 1 0 0 1 1 0 0 1 1 0]
[1 0 0 1 0 1 1 1 1 1 1 1 0 1 0 1 1 0 0 1 0 0 1 1 0]
[1 1 0 1 1 0 0 1 0 1 0 1 1 0 0 1 1 1 1 0 0 0 1 1 1]
[1 1 0 1 0 0 0 1 1 0 1 1 1 0 1 1 0 0 1 1 0 0 1 1 1]]
I'm stuck on row to format it for use with Stratified K-Fold cross-validation.
The best approach I get its this:
X = np.array(df) # Input features
y = np.roll(X, -1, axis=1) # Output targets
But when I try to train using:
for fold, (train_index, test_index) in enumerate(skf.split(X, y)):
print("TRAIN:", train_index, "TEST:", test_index)
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
# train the model
history = model.fit(X_train,
y_train,
epochs=10000,
batch_size=24,
callbacks=callbacks_list,
verbose=1,
validation_data=(X_test, y_test)
)
I get this error:
ValueError: Supported target types are: ('binary', 'multiclass'). Got 'multilabel-indicator' instead.
My model is:
# Initialising the FFNN
model = Sequential()
# Input Layer
model.add(InputLayer(input_shape=(25,)))
# First Hidden Layer
model.add(Dense(250, activation='relu'))
# First Dropout layer
model.add(Dropout(0.2))
# Second Hidden Layer
model.add(Dense(50, activation='relu'))
# Output Layer
model.add(Dense(25, activation=tf.nn.sigmoid))
I'm compiling the model with:
# Compile the model
model.compile(
loss=tf.keras.losses.BinaryCrossentropy(from_logits=False),
optimizer=tf.keras.optimizers.RMSprop(learning_rate=0.01),
metrics=[tf.keras.metrics.AUC(name='auc',from_logits=False)]
)
Have tried other approachs, where errors like below occours:
ValueError: logits and labels must have the same shape ((24, 25) vs (24, 1))