Generally, you can have an epoch_length
(or equivalent) argument every time that you don't want (or you can't) iterate over the whole dataset for each epoch.
Indeed, the most common definition of epoch is the following:
one epoch = one single pass (forward + backward) on all the training
examples
Following this common definition, your model should "see" all the training examples to declare one epoch concluded; then the next one starts. In this case training for n epochs means that the model saw each training examples n times.
However, this is not always feasible / what you want to do.
As an extreme example, imagine that you're training your model on synthetic data, which are generated on-the-fly by the data loader. In this setting your training data are virtually infinite, so there is no concept of "iterating over all training examples". One epoch would last forever. Any callback called at epoch end (e.g. saving model weights, calculating metrics) would never run.
To solve this issue, you can artificially define a number of batches which delimit one epoch in your particular application. So you can say epoch_length=1000
, which means that after training on 1000 examples/batches you consider the epoch terminated and you start a new one. In this way you can decide the granularity with which every operation performed at epoch end (e.g. the callbacks above, logging etc.) is executed.