Which parameters should be used for early stopping?

Question

I'm training a neural network for my project using Keras. Keras has provided a function for early stopping. May I know what parameters should be observed to avoid my neural network from overfitting by using early stopping?

umutto · Accepted Answer · 2017-10-18T07:29:56.333

190

Early stopping is basically stopping the training once your loss starts to increase (or in other words validation accuracy starts to decrease). According to documents it is used as follows;

keras.callbacks.EarlyStopping(monitor='val_loss',
                              min_delta=0,
                              patience=0,
                              verbose=0, mode='auto')

Values depends on your implementation (problem, batch size etc...) but generally to prevent overfitting I would use;

Monitor the validation loss (need to use cross validation or at least train/test sets) by setting the monitor argument to 'val_loss'.
min_delta is a threshold to whether quantify a loss at some epoch as improvement or not. If the difference of loss is below min_delta, it is quantified as no improvement. Better to leave it as 0 since we're interested in when loss becomes worse.
patience argument represents the number of epochs before stopping once your loss starts to increase (stops improving). This depends on your implementation, if you use very small batches or a large learning rate your loss zig-zag (accuracy will be more noisy) so better set a large patience argument. If you use large batches and a small learning rate your loss will be smoother so you can use a smaller patience argument. Either way I'll leave it as 2 so I would give the model more chance.
verbose decides what to print, leave it at default (0).
mode argument depends on what direction your monitored quantity has (is it supposed to be decreasing or increasing), since we monitor the loss, we can use min. But let's leave keras handle that for us and set that to auto

So I would use something like this and experiment by plotting the error loss with and without early stopping.

keras.callbacks.EarlyStopping(monitor='val_loss',
                              min_delta=0,
                              patience=2,
                              verbose=0, mode='auto')

For possible ambiguity on how callbacks work, I'll try to explain more. Once you call fit(... callbacks=[es]) on your model, Keras calls given callback objects predetermined functions. These functions can be called on_train_begin, on_train_end, on_epoch_begin, on_epoch_end and on_batch_begin, on_batch_end. Early stopping callback is called on every epoch end, compares the best monitored value with the current one and stops if conditions are met (how many epochs have past since the observation of the best monitored value and is it more than patience argument, the difference between last value is bigger than min_delta etc..).

As pointed by @BrentFaust in comments, model's training will continue until either Early Stopping conditions are met or epochs parameter (default=10) in fit() is satisfied. Setting an Early Stopping callback will not make the model to train beyond its epochs parameter. So calling fit() function with a larger epochs value would benefit more from Early Stopping callback.

edited Oct 18 '17 at 07:29

answered May 11 '17 at 03:53

umutto

7,460
4
43
53

thank you, I still cannot understand the definition of min delta, from the documentation , Min Delta is define as the minimum change of monitored value, may I know how did Keras defines Min change of monitored value, is it referring to different between current Val loss and previous Val loss? – AizuddinAzman May 11 '17 at 11:48
5

@AizuddinAzman close, `min_delta` is a threshold to whether quantify the change in monitored value as an improvement or not. So yes, if we give `monitor = 'val_loss'` then it would refer to the difference between current validation loss and the previous validation loss. In practice, if you give `min_delta=0.1` a decrease in validation loss (current - previous) smaller than 0.1 would not quantify, thus would stop the training (if you have `patience = 0`). – umutto May 12 '17 at 01:33
Is it make sence to decrease learning rate when val_loss not improved for n_epochs? and how can it be done? – mrgloom May 18 '17 at 15:17
@mrgloom Yes it makes sense. It wouldn't help with overfitting, but as you said, if the training loss (can't guarantee `val_loss` but it helps that too if you are using a good model) stops improving, reason may be that your learning rate is too aggressive or your loss function is not precise enough to traverse down to an optima etc.. I.E loss starts to be noisy or increase. Learning rate decay helps with that. You can use `decay` parameter on your [optimizer](https://keras.io/optimizers/) or setup a [learning rate scheduler](https://keras.io/callbacks/#learningratescheduler) callback in keras. – umutto May 19 '17 at 01:22
3

Note that `callbacks=[EarlyStopping(patience=2)]` has no effect, unless epochs is given to `model.fit(..., epochs=max_epochs)`. – Brent Faust Oct 17 '17 at 00:52
@umutto I should clarify.. `model.fit` will perform so many epochs, by default. Providing an EarlyStopping callback with a patience value will not cause it to continue training past it's default stopping point, unless `epochs` is given. That way, you force it to keep training, unless the EarlyStopping conditions are met. – Brent Faust Oct 18 '17 at 06:14
1

@BrentFaust That is my understanding as well, I've written the answer on assumption that the model is being trained with at least 10 epochs (as default). After your comment, I've realized that there may be a case that the programmer is calling fit with `epoch=1` in a for loop (for various use cases) in which this callback would fail. If there is ambiguity in my answer, I will try to put it in a better way. – umutto Oct 18 '17 at 07:10
With the EarlyStopping callback, does the resulting model object from `model.fit` return the best model or the one on which the training stopped (final epoch trained)? – AdmiralWen Aug 31 '18 at 02:08
4

@AdmiralWen Since I've written the answer, the code has changed a bit. If you are using the latest version of Keras, you can use the [`restore_best_weights`](https://github.com/keras-team/keras/blob/master/keras/callbacks.py#L480&L483) argument (not on the documentation yet), which loads the model with best weights after training. But, for your purposes I would use [`ModelCheckpoint`](https://keras.io/callbacks/#modelcheckpoint) callback with `save_best_only` argument. You can check the documentation, it is straight forward to use but you need to manually load the best weights after training. – umutto Aug 31 '18 at 06:09
1

@umutto Hello thanks for the suggestion of the restore_best_weights, however I am unable to use it, ` es = EarlyStopping(monitor='val_acc', min_delta=1e-4, patience=patience_,verbose=1,restore_best_weights=True) TypeError: __init__() got an unexpected keyword argument 'restore_best_weights'`. Any ideas? keras 2.2.2, tf, 1.10 what is your version? – Arka Mallick Sep 23 '18 at 14:02
@Haramoz ah sorry my bad, I haven't realized it wasn't released yet. It seems that the commit was after the last Keras version (2.2.2). It would probably be included in the next release or you can install it from the source (which could be unstable). I would recommend using [`ModelCheckpoint` callback](https://keras.io/callbacks/#modelcheckpoint) in the mean time. – umutto Sep 25 '18 at 01:00
1

Yes man. I am able to use this feature already by installing keras from the source github. – Arka Mallick Sep 25 '18 at 08:34
Is not `patience=2` is very low? The model can start bad and then improve after some epochs, and you will miss it. Why not to use higher `patience`, and use `save_only_bast_model`? – Mr.O Sep 14 '20 at 09:37

score 2 · Answer 2 · answered Jul 29 '21 at 11:35

Here's an example of EarlyStopping from another project, AutoKeras (https://autokeras.com/), an automated machine learning (AutoML) library. The library sets two EarlyStopping parameters: patience=10 and min_delta=1e-4

https://github.com/keras-team/autokeras/blob/5e233956f32fddcf7a6f72a164048767a0021b9a/autokeras/engine/tuner.py#L170

the default quantity to monitor for both AutoKeras and Keras is the val_loss:

https://github.com/keras-team/keras/blob/cb306b4cc446675271e5b15b4a7197efd3b60c34/keras/callbacks.py#L1748 https://autokeras.com/image_classifier/

Which parameters should be used for early stopping?

2 Answers2

Linked