0

I have a training set on which I would like to train a neural network, using K-folds cross validation.

TL;DR: Given the number of epochs, the set of params to be used, and checking on the test-set, how RandomizedSearchCV trains the model? I would think that for a combination of params, it trains the model on (K-1) folds for epochs number of epochs. Then it tests it on the last fold. But then, what prevent us from overfitting? When "vanilla" training with a constant validation set, after each epoch keras checks it on the validation set, is it done here as well? Even though verbose=1 I don't see the scores from the fit on the remaining fold. I saw here that we can add callbacks to the KerasClassifier, but then, what happens if the settings of KerasClassifier and RandomizedSearchCV clash? Can I add there a callback to check the val_prc, for exampl? If so, what would happen?

Sorry for the long TL;DR!

Regarding the training procedure, I am using the keras-sklearn interface. I defined the model using

model = KerasClassifier(build_fn=get_model_, epochs=120, batch_size=32, verbose=1)

Where get_model_ is a function that returns a compiled tf.keras model.

Given the model, the training procedure is the following:

params = dict({'l2':[0.1,0.3,0.5,0.8],
               'dropout_rate':[0.1,0.3,0.5,0.8],
               'batch_size':[16,32,64,128],
               'learning_rate':[0.001, 0.01, 0.05, 0.1]})


def trainer(model, X, y, folds, params, verbose=None):
    from keras.wrappers.scikit_learn import KerasClassifier
    from tensorflow.keras.optimizers import Adam
    from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
    
    if not verbose:
        v=0
    else:
      v = verbose
    
    clf = RandomizedSearchCV(model, 
                         param_distributions = params, 
                         n_jobs = 1, 
                         scoring="roc_auc",
                         cv = folds, 
                         verbose = v)
    # -------------- fit ------------
    grid_result = clf.fit(X, y)
    # summarize results
    
    print('- '*40)
    print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
    print('- '*40)

# ------ Training -------- #
trainer(model, X_train, y_train, folds, params, verbose=1)

First, do I use RandomizedSearchCV right? Regardless of the number of options for each param I get the same message: Fitting 5 folds for each of 10 candidates, totalling 50 fits
Second, I have a hard problem with imbalanced data + lack of data. Even so, I get unexpectedly low scores and high loss values.
Lastly, and following the TL;DR, what is the training procedure that is actually being done using the above code, assuming that it is correct.

Thanks!

David Harar
  • 301
  • 2
  • 12

1 Answers1

0

First, do I use RandomizedSearchCV right? Regardless of the number of options for each param I get the same message: Fitting 5 folds for each of 10 candidates, totalling 50 fits

RandomizedSearchCV has an argument n_iter that defaults to 10, it will thus sample 10 configurations of parameters, no matter how many possible ones are there. If you want to run all combinations you want to use GridSearchCV instead.

Second, I have a hard problem with imbalanced data + lack of data. Even so, I get unexpectedly low scores and high loss values.

This is way too broad / ill posed question for stack overflow.

Lastly, and following the TL;DR, what is the training procedure that is actually being done using the above code, assuming that it is correct.

For i=1 to n_iters (10):

    Get random hyperparameters from provided space
    Split data into 5 equal chunks (X_1, y_1), ..., (X_5, y_5)

    scores = []
    for k=1 to 5:
       Train model with given hyperparameters on all chunks apart from (X_k, y_k)
       Evaluate the above model on (X_k, y_k)
       Append score to scores
    
    if avg(scores) > best_score:
      best_score = avg(scores)
      best_model = model
      best_hyperparameters = hyperparameters
lejlot
  • 64,777
  • 8
  • 131
  • 164