Why doesn't restore_best_weights=True update results?

Question

I found that restore_best_weights=True does not actually restore the best behavior. A simplified example with some dummy data:

import numpy as np
from tensorflow.keras.utils import set_random_seed
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import RMSprop
from tensorflow.keras.callbacks import EarlyStopping

np.random.seed(1)
set_random_seed(2)

x = np.array([1., 2., 3., 4., 5.])
y = np.array([1., 3., 4., 2., 5.])

model = Sequential()
model.add(Dense(2, input_shape=(1, ), activation='tanh'))
model.add(Dense(4,                    activation='relu'))
model.add(Dense(1))

model.compile(optimizer=RMSprop(learning_rate=0.1), loss='mse')
stopmon = EarlyStopping(monitor='loss', patience=2, restore_best_weights=True, verbose=1)
history = model.fit(x, y, epochs=100, verbose=2, callbacks=[stopmon])
res = model.evaluate(x, y, verbose=1)
print(f'best={stopmon.best:.4f}, loss={res:.4f}')

The output (on my system) is:

Epoch 1/100
1/1 - 0s - loss: 11.8290 - 434ms/epoch - 434ms/step
Epoch 2/100
1/1 - 0s - loss: 1.9091 - 0s/epoch - 0s/step
Epoch 3/100
1/1 - 0s - loss: 1.5159 - 16ms/epoch - 16ms/step
Epoch 4/100
1/1 - 0s - loss: 1.3921 - 0s/epoch - 0s/step
Epoch 5/100
1/1 - 0s - loss: 1.6787 - 0s/epoch - 0s/step
Epoch 6/100
Restoring model weights from the end of the best epoch: 4.
1/1 - 0s - loss: 2.0629 - 33ms/epoch - 33ms/step
Epoch 6: early stopping
1/1 [==============================] - 0s 100ms/step - loss: 1.6787
best=1.3921, loss=1.6787

It looks like the weights are set to those from epoch 4. Then why does the loss still evaluate to the higher value from epoch 6? Is there anything extra I should do to update the model or something?

I use an up-to-date TensorFlow (version 2.12.0) on Windows x64 (Intel), tf.version.COMPILER_VERSION == 'MSVC 192930140'.

I can't reproduce your training. For one, I am confused that your loss increases altough you always present the same input. When I reproduced the code, my loss was always decreasing. But I also used another tensorflow version ('2.3.1') and only a CPU. Due to my tf version, I could not use set_random_seed but instead I used random.seed(seed), np.random.seed(seed), tf.random.set_seed(seed), which is equivalent according to the tf docs. — mss, Mar 30 '23 at 13:05
Early-stopping only makes sense if you have a validation dataset. The goal of early-stopping is to find when continuing training would lead to overfitting of the model, at which point it return the best model _in terms of the validation loss_. Put another way, the validation loss is used as a proxy for how well the model will generalize to (i.e., perform on) the test data. I suspect that the `EarlyStopping` may have undefined behavior when no validation data is used. — ATony, Mar 30 '23 at 14:25

score 1 · Accepted Answer · answered Mar 30 '23 at 17:21

I think it has something to do with the loss calculation for the training loss.
But it still works, at least for the val_loss.
I have done 2 tests

1_Without validation:

np.random.seed(1)
set_random_seed(2)

x = np.random.randn(1000)
y = np.random.randn(1000)

model = Sequential()
model.add(Dense(2, input_shape=(1,), activation='tanh'))
model.add(Dense(4, activation='relu'))
model.add(Dense(1))

model.compile(optimizer=RMSprop(learning_rate=0.1), loss='mse')
stopmon = EarlyStopping(monitor='loss', patience=2, restore_best_weights=True,    verbose=1)
history = model.fit(x, y, epochs=100, verbose=2, callbacks=[stopmon])
res = model.evaluate(x, y, verbose=1)
print(f'best={stopmon.best:.4f}, loss={res:.4f}')

The output is:

Epoch 1/100
32/32 - 0s - loss: 0.9681 - 468ms/epoch - 15ms/step
Epoch 2/100
32/32 - 0s - loss: 0.9515 - 33ms/epoch - 1ms/step
Epoch 3/100
32/32 - 0s - loss: 0.9675 - 30ms/epoch - 953us/step
Epoch 4/100
Restoring model weights from the end of the best epoch: 2.
32/32 - 0s - loss: 0.9596 - 37ms/epoch - 1ms/step
Epoch 4: early stopping
32/32 [==============================] - 0s 952us/step - loss: 1.0256
best=0.9515, loss=1.0256

You can see, its very strange that the loss is higher than the rest. Maybe due how the calculate the loss in the training step.

2_With validation step.

np.random.seed(1)
set_random_seed(2)

x = np.random.randn(1000)
y = np.random.randn(1000)

x2 = np.random.randn(50)
y2 = np.random.randn(50)

model = Sequential()
model.add(Dense(2, input_shape=(1,), activation='tanh'))
model.add(Dense(4, activation='relu'))
model.add(Dense(1))

model.compile(optimizer=RMSprop(learning_rate=0.1), loss='mse')
stopmon = EarlyStopping(monitor='val_loss', patience=2, restore_best_weights=True, verbose=1)
history = model.fit(x, y, epochs=100, verbose=2, callbacks=[stopmon], validation_data=(x2, y2))
res = model.evaluate(x2, y2, verbose=1)
print(f'best={stopmon.best:.4f}, loss={res:.4f}')

And the output is:

Epoch 1/100
32/32 - 1s - loss: 0.9681 - val_loss: 1.0496 - 626ms/epoch - 20ms/step
Epoch 2/100
32/32 - 0s - loss: 0.9515 - val_loss: 0.9901 - 57ms/epoch - 2ms/step
Epoch 3/100
32/32 - 0s - loss: 0.9675 - val_loss: 1.0150 - 57ms/epoch - 2ms/step
Epoch 4/100
Restoring model weights from the end of the best epoch: 2.
32/32 - 0s - loss: 0.9596 - val_loss: 1.0154 - 57ms/epoch - 2ms/step
Epoch 4: early stopping
2/2 [==============================] - 0s 2ms/step - loss: 0.9901
best=0.9901, loss=0.9901

You can see that in this case they match. So in conclusion, we can say it works with val_loss and it has a strange loss calculation in the training step.

This makes sense, as `loss` is before updating the weights and `val_loss` is after. — Michel de Ruiter, Apr 12 '23 at 14:09

Why doesn't restore_best_weights=True update results?

1 Answers1