Whats the correct way to format X and Y from binnary dataframe to use on Stratified K-Fold cross-validation

Question

My data is an dataframe of a table of 25 columns and 2737 rows containg binnary data.

The goal is to train using each row as an INPUT and get as an OUTPUT a probabilistic prediction of what the next sequence could be.

Data on this scenario is always 10 number 0 and 15 number 1, or 10 False and 15 True.

First 5 rows from my dataframe is:

[[0 1 1 0 1 1 0 0 1 1 1 0 1 1 0 1 0 1 0 1 0 0 1 1 1]
 [1 0 0 1 1 1 1 0 1 0 1 1 1 0 1 1 0 0 1 1 0 0 1 1 0]
 [1 0 0 1 0 1 1 1 1 1 1 1 0 1 0 1 1 0 0 1 0 0 1 1 0]
 [1 1 0 1 1 0 0 1 0 1 0 1 1 0 0 1 1 1 1 0 0 0 1 1 1]
 [1 1 0 1 0 0 0 1 1 0 1 1 1 0 1 1 0 0 1 1 0 0 1 1 1]]

I'm stuck on row to format it for use with Stratified K-Fold cross-validation.

The best approach I get its this:

X = np.array(df)  # Input features
y = np.roll(X, -1, axis=1)  # Output targets

But when I try to train using:

for fold, (train_index, test_index) in enumerate(skf.split(X, y)):
    print("TRAIN:", train_index, "TEST:", test_index)
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

    # train the model
    history = model.fit(X_train,
                        y_train,
                        epochs=10000,
                        batch_size=24,
                        callbacks=callbacks_list,
                        verbose=1,
                        validation_data=(X_test, y_test)
                       )

I get this error:

ValueError: Supported target types are: ('binary', 'multiclass'). Got 'multilabel-indicator' instead.

My model is:

# Initialising the FFNN

model = Sequential()

# Input Layer
model.add(InputLayer(input_shape=(25,)))

# First Hidden Layer
model.add(Dense(250, activation='relu'))

# First Dropout layer
model.add(Dropout(0.2))

# Second Hidden Layer
model.add(Dense(50, activation='relu'))

# Output Layer
model.add(Dense(25, activation=tf.nn.sigmoid))

I'm compiling the model with:

# Compile the model
model.compile(
    loss=tf.keras.losses.BinaryCrossentropy(from_logits=False),
    optimizer=tf.keras.optimizers.RMSprop(learning_rate=0.01),
    metrics=[tf.keras.metrics.AUC(name='auc',from_logits=False)]
)

Have tried other approachs, where errors like below occours:

ValueError: logits and labels must have the same shape ((24, 25) vs (24, 1))

Whats the correct way to format X and Y from binnary dataframe to use on Stratified K-Fold cross-validation

0 Answers0