How to choose batch_size, steps_per_epoch and epoch with Keras generator

Question

I'm training 2 different CNN (custom and transfer learning) for an image classification problem. I use the same generator for both models. The dataset contains 5000 samples for 5 classes, but is imbalanced.

Here's the custom model I'm using.

def __init__(self, transfer_learning = False, lambda_reg = 0.001, drop_out_rate = 0.1):
    if(transfer_learning == False):
        self.model = Sequential();
        self.model.add(Conv2D(32, (3,3), input_shape = (224,224,3), activation = "relu"))
        self.model.add(MaxPooling2D(pool_size = (2,2)))

        self.model.add(Conv2D(64, (1,1), activation = "relu"))
        self.model.add(MaxPooling2D(pool_size = (2,2)))

        self.model.add(Conv2D(128, (3,3), activation = "relu"))
        self.model.add(MaxPooling2D(pool_size = (2,2)))

        self.model.add(Conv2D(128, (1,1), activation = "relu"))
        self.model.add(MaxPooling2D(pool_size = (2,2)))

        self.model.add(Flatten())

        self.model.add(Dense(512))
        self.model.add(Dropout(drop_out_rate))
        self.model.add(Dense(256))
        self.model.add(Dropout(drop_out_rate))

        self.model.add(Dense(5, activation = "softmax"))

So I can't understand the relation between steps_per_epoch and batch_size. batch_size is the number of samples the generator sends. But is steps_per_epoch the number of batch_size to complete one training epoch? If so, then it should be: steps_per_epoch = total_samples/batch_size ?

Whatever value I try, I always get the same problem (on both models), the val_acc seems to reach a local optima.

local optima are often caused by an unoptimal learning rate. have you tried increasing it? — Florian H, May 27 '19 at 06:34
It's a little difficult to tell you how to optimize your neural network by only seeing your iamge generator. — Florian H, May 27 '19 at 08:33
`steps_per_epoch` should have no bound on `batch_size`, where `batch_size` controls how much data you will be training at the same time - usually the larger the better but it eats up GPU memory. Steps per epoch limits the max steps before the model converges. i.e. once your model hits a threshold, or exceeds `steps_per_epoch`, the epoch'll halt. And an epoch will not necessarily use up all data. — knh190, May 27 '19 at 09:58
Possible duplicate of [What's the difference between "samples\_per\_epoch" and "steps\_per\_epoch" in fit\_generator](https://stackoverflow.com/questions/43457862/whats-the-difference-between-samples-per-epoch-and-steps-per-epoch-in-fit-g) — Markus, May 27 '19 at 18:21

score 5 · Accepted Answer · answered Jul 13 '19 at 22:09

You are mixing two issues here. One is how to determine batch_size vs steps_per_epoch; the other one is why val_acc seems to reach a local optima and won't continue improving.

(1) For the issue -- batch_size vs steps_per_epoch

The strategy should be first to maximize batch_size as large as the memory permits, especially when you are using GPU (4~11GB). Normally batch_size=32 or 64 should be fine, but in some cases, you'd have to reduce to 8, 4, or even 1. The training code will throw exceptions if there is not enough memory to allocate, so you know when to stop increasing the batch_size.

Once batch_size is set, steps_per_epoch can be calculated by Math.ceil(total_samples/batch_size). but sometimes, you may want to set it a few times larger when data augmentation is used.

(2) The second issue -- val_acc reaches local optima, won't continue improving

It is the crux of the matter for deep learning, isn't it? It makes DL both exciting and difficult at the same time. The batch_size, steps_per_epoch and number of epochs won't help much here. It is the model and the hyperparameters (such as learning rate, loss function, optimization function, etc.) that controls how the model performs.

A few easy tips are to try different learning rates, different optimization functions. If you find the model is overfitting (val_acc going down with more epochs), increasing the sample size always helps if it is possible. Data augmentation helps to some degree.

yaho cho · Answer 2 · 2019-05-27T10:05:07.120

First of all, steps_per_epoch = total_samples/batch_size is correct in general terms.
It's an example code written by tensowflow as following:

for epoch in range(training_epochs):
    avg_cost = 0
    total_batch = int(mnist.train.num_examples / batch_size)

    for i in range(total_batch):
        batch_xs, batch_ys = mnist.train.next_batch(batch_size)
        feed_dict = {X: batch_xs, Y: batch_ys}
        c, _ = sess.run([cost, optimizer], feed_dict=feed_dict)
        avg_cost += c / total_batch

    print('Epoch:', '%04d' % (epoch + 1), 'cost =', '{:.9f}'.format(avg_cost))

print('Learning Finished!')

By the way, Although It is not exactly related with your question. There are some various optimizer such as Stochastic Gradient Descent and Adam because that a learning takes too long time with heavy data set.
It does not learn all data every time. There are many articles about that. Here I just leave one of them.

And, For your val_acc, It seems that Your model has so many Convolution layer.
You reduced filters and maxpooling of convolution layers, But, I think it is too much. How is going on? Is it better than before?

It didn't change anything yet. I'm trying to reduce the number of CL and lowering the number of parameters. I did read the code you provided. That's basically what I was doing, so I'll try to tune the model. — Bouji, May 27 '19 at 11:32

How to choose batch_size, steps_per_epoch and epoch with Keras generator

2 Answers2