2

Inorder to train a frcnn model you need to define two arguments,

  1. num_epochs
  2. epoch_length

The default value is 1000 for epoch_length. Additionally, if I have 500 num_epochs, then each epoch is 1000 steps long. In this article it states that 'Note that every batch only processes one image in here.'

So if I have only one class to train with 1300 images, then should I change the epoch_length to 1300 instead of 1000?

num_epochs = 500
epoch_length = 1000

for epoch_num in range(num_epochs):

    progbar = generic_utils.Progbar(epoch_length)
    print('Epoch {}/{}'.format(epoch_num + 1, num_epochs))
Malgo
  • 1,871
  • 1
  • 17
  • 30
  • can you add part of your config file here, which will give us more idea on what is the configuration you have ? – venkata krishnan Nov 27 '19 at 09:57
  • The config file has the default settings as mentioned in it's research paper, is the snippet still required? I am adding the code for the training part where these variables are used. Also when I say that the default value is 1000 - I meant in all the articles that I came across they used 1000. I am getting confused from reading that every batch processes only one image. – Malgo Nov 27 '19 at 10:17

1 Answers1

3

Generally, you can have an epoch_length (or equivalent) argument every time that you don't want (or you can't) iterate over the whole dataset for each epoch.

Indeed, the most common definition of epoch is the following:

one epoch = one single pass (forward + backward) on all the training examples

Following this common definition, your model should "see" all the training examples to declare one epoch concluded; then the next one starts. In this case training for n epochs means that the model saw each training examples n times.

However, this is not always feasible / what you want to do.

As an extreme example, imagine that you're training your model on synthetic data, which are generated on-the-fly by the data loader. In this setting your training data are virtually infinite, so there is no concept of "iterating over all training examples". One epoch would last forever. Any callback called at epoch end (e.g. saving model weights, calculating metrics) would never run.

To solve this issue, you can artificially define a number of batches which delimit one epoch in your particular application. So you can say epoch_length=1000, which means that after training on 1000 examples/batches you consider the epoch terminated and you start a new one. In this way you can decide the granularity with which every operation performed at epoch end (e.g. the callbacks above, logging etc.) is executed.

ndrplz
  • 1,584
  • 12
  • 16