how to find optimal hyperparams in convolutional net?

Question

I came to know scikit-optimize package , and I am relatively new to Bayesian optimization which I want to use it in my current Convolutional NN. However, I tried to find best hyperparameters of convolutional NN by using Bayesian-optimization but my current attempt is not working properly.

So far, I tried to come up implementation for this purpose but my code is not working properly which I don't know which part of my code remain issues. Can anyone point me out how to make this right? Is there any efficient implementation for using Bayesian optimization on convolutional NN for the sake of finding best hyperparameters? Any possible thoughts?

update

I tried GridSearchCV, RandomSearchCV for my convolutional NN which has really deep layer, and using GridSearchCV took too much time to complete even 2-3 whole days can't finish the optimization. I want to use new optimization framework like bayesian-optimization (i.e, skopt, optuna) for finding best param and hyperparams of convolutional NN. Can anyone provide possible remedy and efficient approach to my current attempt 1 in colab and my attempt 2 in colab ? Any thoughts?

my current attempt:

here is my current attempt where I used scikit-optimize package for Bayesian optimization. here is my attempt in this colab where I ran all my experiment of implementing Bayesian optimization on convolutional NN to find its best hyperparams:

### function returned to Bayesian Optimization

@use_named_args(dimensions=dimensions)
def bayes_opt(cnn_num_steps, cnn_init_epoch, cnn_max_epoch,
              cnn_learning_rate_decay, cnn_batch_size, cnn_dropout_rate, cnn_init_learning_rate):

    global  iteration, num_steps, init_epoch, max_epoch, learning_rate_decay, dropout_rate, init_learning_rate, batch_size

    num_steps = np.int32(cnn_num_steps)
    batch_size = np.int32(cnn_batch_size)
    learning_rate_decay = np.float32(cnn_learning_rate_decay)
    init_epoch = np.int32(cnn_init_epoch)
    max_epoch = np.int32(cnn_max_epoch)
    dropout_rate = np.float32(cnn_dropout_rate)
    init_learning_rate = np.float32(cnn_init_learning_rate)

    tf.reset_default_graph()
    tf.set_random_seed(randomState)
    sess = tf.Session()

    (train_X, train_y), (test_X, test_y) = cifar10.load_data()
    train_X = train_X.astype('float32') / 255.0
    test_X = test_X.astype('float32') / 255.0

    targets = tf.placeholder(tf.float32, [None, input_size], name="targets")
    
    model_learning_rate = tf.placeholder(tf.float32, None, name="learning_rate")
    model_dropout_rate = tf.placeholder_with_default(0.0, shape=())
    global_step = tf.Variable(0, trainable=False)

    prediction = cnn(model_dropout_rate, model_learning_rate)

    model_learning_rate = tf.train.exponential_decay(learning_rate=model_learning_rate, global_step=global_step, decay_rate=learning_rate_decay,
                                               decay_steps=init_epoch, staircase=False)

    with tf.name_scope('loss'):
        model_loss = tf.losses.mean_squared_error(targets, prediction)

    with tf.name_scope('adam_optimizer'):
        train_step = tf.train.AdamOptimizer(model_learning_rate).minimize(model_loss,global_step=global_step)

    sess.run(tf.global_variables_initializer())

    for epoch_step in range(max_epoch):
        for batch_X, batch_y in generate_batches(train_X, train_y, batch_size):
            train_data_feed = {
                inputs: batch_X,
                targets: batch_y,
                model_learning_rate: init_learning_rate,
                model_dropout_rate: dropout_rate
            }
            sess.run(train_step, train_data_feed)

    ## how to return validation error, any idea?
    ## return validation error
    ## return val_error

my current attempt in colab is still have various issues and it hasn't done yet. Can anyone provide possible workable approach by using bayesian optimization for finding best hyperparams of very deep convolutional NN? Any thoughts? Thanks!

Are you able to optimize a simpler problem first? And then apply it to the cnn? — Justas, Aug 09 '20 at 01:40
@Justas yes, I used `GridSearchCV` for finding hyperparams but that's not very efficient to my problem, I mean it is really time consuming for very deep NN. In this post, I just used very simple CNN and tried find its hyperparams using Bayesian-optimization but it is not working. Do you mind provide possible remedy to fix my problem? Thanks! — Hamilton, Aug 09 '20 at 02:36
Try something like https://optuna.readthedocs.io/en/stable/ instead? — AKX, Aug 09 '20 at 15:31
@AKX here is my attempt using `optuna` in [this colab](https://gist.github.com/jerry-shad/2b76bc6347f0fabf2df221d58e4376b4), can you provide your possible attempts as an answer? Thanks! — Hamilton, Aug 09 '20 at 18:16
Another tool for this is [hyperopt](https://hyperopt.github.io/hyperopt), although it does not currently implement Bayesian optimization as such (even though, according to the authors, it "has been designed to accommodate Bayesian optimization algorithms"). I posted [this answer](https://stackoverflow.com/a/44182285/1782792) listing a few tools a couple of years ago. — jdehesa, Aug 11 '20 at 08:56
@jdehesa right, I looked into `hyperopt` and it is not intuitive to code it up and used it for deep convolutional NN. Could you provided possible canonical coding solution in answer thread? I gave shot of using bayesian optimization, it wasn't properly working, my code has some deficiency. Your possible coding input would be highly appreciated! — Hamilton, Aug 11 '20 at 12:42
I don't get your point, you absolutely want to use bayesian optimisation but don't want an algorythm that takes too long to search?? In this case just use non random inits and train for only a few round and hope that the improvements scale. — Al rl, Aug 16 '20 at 10:46

pratsbhatt · Accepted Answer · 2020-08-16T10:39:26.617

I will suggest you to use the Keras Tuner package for Bayesian Optimization.

Below is just a small example on how you can achieve this.

from kerastuner import HyperModel, Objective
import tensorflow as tf
from kerastuner.tuners import BayesianOptimization

# Create the keras tuner model.
class MyHyperModel(HyperModel):
    
    def build(self, hp):
        model = tf.keras.Sequential()
        model.add(tf.keras.layers.Embedding(len(tokenizer.word_index) + 1, embedding_dim))
        for i in range(hp.Int('num_layers', 1, 3)):
            model.add(tf.keras.layers.Conv1D(filters=hp.Choice('num_filters', values=[32, 64], default=64),activation='relu',
                                             kernel_size=3,
                                             bias_initializer='glorot_uniform'))
            model.add(tf.keras.layers.MaxPool1D())
        
        model.add(tf.keras.layers.GlobalMaxPool1D())
        
        for i in range(hp.Int('num_layers_rnn', 1, 3)):
            model.add(tf.keras.layers.Dense(units=hp.Int('units', min_value=32, max_value=512, step=32), activation='relu'))
            model.add(tf.keras.layers.Dropout(0.2))
        
        model.add(tf.keras.layers.Dense(1, activation='sigmoid'))
        
        model.compile(
            optimizer=hp.Choice('optimizer', values= ['Adam', 'Adadelta', 'Adamax']),
            loss='binary_crossentropy',
            metrics=[f1])
        return model

And then once created you can start the training of the model by following code.

hypermodel = MyHyperModel()

tuner = BayesianOptimization(
    hypermodel,
    objective=Objective('val_f1', direction="max"),
    num_initial_points=50,
    max_trials=15,
    directory='./',
    project_name='real_or_not')

tuner.search(train_dataset,
             epochs=10, validation_data=validation_dataset)

You can look at the documentation at this link. I am also attaching a link to Kaggle Notebook that demonstrates the Bayesian Optimization which I have written by myself. I am attaching the link so that you can try the example practically out. Feel free to ask any further questions.

UPDATE: 16/08

You commented that you will like to have the following hyperparameters tuned using Bayesian Optimization. I will approach the problem in the following way.

import tensorflow as tf
from kerastuner import HyperModel, Objective
from kerastuner.tuners import BayesianOptimization

class MyHyperModel(HyperModel):

def build(self, hp):
    model = tf.keras.Sequential()
    model.add(tf.keras.layers.Conv2D(filters=hp.Choice('num_filters', values=[32, 64], default=64),activation='relu',
                                         kernel_size=(3,3),
                                         bias_initializer='glorot_uniform', input_shape=(32, 32, 3)))
    model.add(tf.keras.layers.MaxPooling2D())
    for i in range(hp.Int('num_layers', 1, 3)):
        model.add(tf.keras.layers.Conv2D(filters=hp.Choice('num_filters', values=[32, 64], default=64),activation='relu',
                                         kernel_size=(3,3),
                                         bias_initializer='glorot_uniform'))
        model.add(tf.keras.layers.MaxPooling2D())
    
    model.add(tf.keras.layers.Flatten())
    
    for i in range(hp.Int('num_layers_rnn', 1, 3)):
        model.add(tf.keras.layers.Dense(units=hp.Int('units', min_value=32, max_value=512, step=32), activation='relu'))
        model.add(tf.keras.layers.Dropout(rate=hp.Choice('droup_out_rate', values=[0.2, 0.4, 0.5], default=0.2)))
    
    model.add(tf.keras.layers.Dense(1, activation='sigmoid'))
    
    model.compile(
        optimizer=tf.keras.optimizers.Adam(
        hp.Choice('learning_rate', [1e-2, 1e-3, 1e-4])),
        loss='binary_crossentropy',
        metrics=['accuracy'])
    return model


class MyTuner(BayesianOptimization):
  def run_trial(self, trial, *args, **kwargs):
    # You can add additional HyperParameters for preprocessing and custom training loops
    # via overriding `run_trial`
    kwargs['batch_size'] = trial.hyperparameters.Int('batch_size', 32, 256, step=32)
    kwargs['epochs'] = trial.hyperparameters.Int('epochs', 10, 30)
    super(MyTuner, self).run_trial(trial, *args, **kwargs)

hypermodel = MyHyperModel()

tuner = MyTuner(
    hypermodel,
    objective=Objective('val_acc', direction="max"),
    num_initial_points=50,
    max_trials=15,
    directory='./',
    project_name='cnn_bayesian_opt')

tuner.search(train_dataset, validation_data=validation_dataset)

You can also have a look at the github issue explaining how to tune epochs and batch_size here.

The above code will tune the following parameters as requested by you.

number_of_convolutional_filter
number_of_hidden_layer
drop_rate
learning_rate
batch_size
epochs

thanks for your attempt. I saw your attempt and seems it is not good fit for optimizing CNN on image classification. Can you have update your attempt using `optuna`? Here is [my attempt using `optuna`](https://gist.github.com/jerry-shad/2b76bc6347f0fabf2df221d58e4376b4), but I have hard time to have efficient solution. Your possible update will be appreciated. — Hamilton, Aug 14 '20 at 22:04
can we update your current attempt with using CNN model that I used in [this gist](https://gist.github.com/jerry-shad/2b76bc6347f0fabf2df221d58e4376b4). I want to optimize `number_of_convolutional_filter`, `number_of_hidden_layer`, `drop_rate`, `learning_rate`, `batch_size`, `epochs` and `number_of_trails` for my model. Do you mind to show how can we get those done in your updated attempt? Thanks a lot!!! — Hamilton, Aug 14 '20 at 22:11
@Hamilton it is a good fit for any type of network. I had showed it with an NLP problem. But it can be updated for any network. I will update my answer soon. — pratsbhatt, Aug 15 '20 at 10:08

score 2 · Answer 2 · answered Aug 16 '20 at 13:06

Ax platform is very powerful tool to use Bayesian optimization on deep NN. Here is my aproach by using ax as follow:

build CNN model

!pip install ax-platform 

from tensorflow.keras import  models
from ax.service.managed_loop import optimize
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.layers import Conv2D, MaxPooling2D
from tensorflow.keras.layers import Activation, Dropout, Flatten, Dense

(X_train, y_train), (X_test, y_test) = cifar10.load_data()
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

def build_model(opt, dropout):
    model = models.Sequential()
    model.add(Conv2D(32, kernel_size=(3,3), input_shape=(32,32,3)))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2,2)))
    model.add(Flatten())
    model.add(Dense(n_hidden))
    model.add(Activation('relu'))
    model.add(Dropout(dropout))
    model.add(Dense(10))
    model.add(Activation('softmax'))
    model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
    return model

train CNN model

next step is train CNN model and return its accuracy which will be used for Bayesian optimization:

def train_evaluate(param):
    acc = 0
    mymodel = build_model(opt=param["opt"], dropout=param["dropout"])
    mymodel.fit(X_train, y_train, epochs=param["epochs"], batch_size=param["batch_size"],verbose=1, validation_data=[X_test, y_test])
    acc = mymodel.evaluate(X_test, y_test)[1]
    print(param, acc)
    del mymodel
    return acc

run Bayesian optimization

best_parameters, values, experiment, model = optimize(
     parameters=[
                 {"name": "opt", "type": "choice", "values": ['adam', 'rmsprop', 'sgd']},
                 {"name": "dropout", "type": "choice", "values": [0.0, 0.25, 0.50, 0.75, 0.99]},
                 {"name": "epochs", "type": "choice", "values": [10, 50, 100]},
                 {"name": "batch_size", "type": "choice", "values": [32,64, 100, 128]}
                ],
    evaluation_function=train_evaluate,
    objective_name="acc",
    total_trials=10,
    )

return best parameters

data = experiment.fetch_data()
df = data.df
best_arm_name = df.arm_name[df["mean"] == df["mean"].max()].values[0]
best_arm = experiment.arms_by_name[best_arm_name]

print(best_parameters)
print(best_arm)

Note that you could add other parameters that you want to optimize such as learning_rate, num_hidden_layer in same fashion that I showed above. I hope this works for your need. Let me know if you have further question. Good luck!

how to find optimal hyperparams in convolutional net?

2 Answers2