Keras: Best model according to ModelCheckpoint-callback produces different loss on training set than best epoch loss displayed while training

Question

I am trying to train a very simple Keras model with TensorFlow back end in Python.

I am aware that the epoch losses displayed in the console while training are calculated 'on the fly' for efficiency and therefore are not necessarily real losses of intermediate models. But to my understanding, they actually should be if each epoch consisted of just one batch that is the whole training set. The reason for that expectation is that in that case the model's weights are only updated once at the end of each epoch which means that the model does not change while an epoch's loss is being calculated.

Unfortunately, even if I set the batch size to the size of the training set, the best epoch's loss differs from the loss of the model that is best according to the ModelCheckpoint-callback.

Can someone explain that behavior to me? Does the ModelCheckpoint-callback maybe only calculate losses of the intermediate models some sort of 'on the fly' too?

Here's my code in which bestEpochLoss and bestModelLoss are never the same:

import numpy
import keras

#Create train data
trainInput = numpy.array([4,3,1,0,2])
trainOutput = numpy.array([0,2,2,0,1])

#Create and train model 
model = keras.Sequential([
    keras.layers.Dense(200, input_shape=(1,), activation='tanh'),
    keras.layers.Dense(1, activation='linear')
])
model.compile(loss='mean_squared_error', optimizer=keras.optimizers.Adam(lr=0.1))
callbacks = [keras.callbacks.ModelCheckpoint(filepath='model.hdf5', monitor='loss', verbose=1, save_best_only=True)]
history = model.fit(trainInput, trainOutput, callbacks=callbacks, epochs=20, batch_size=len(trainInput))

#Evaluate best training epoch's loss vs best model's loss
bestEpochLoss = numpy.min(history.history['loss'])
bestModel = keras.models.load_model('model.hdf5')
bestModelLoss = bestModel.evaluate(trainInput, trainOutput)
print('Best training epoch\'s loss: ' + str(bestEpochLoss))
print('Best model\'s loss: ' + str(bestModelLoss))

score 0 · Answer 1 · answered Dec 27 '18 at 23:03

0

The reason for that expectation is that in that case the model's weights are only updated once at the end of each epoch which means that the model does not change while an epoch's loss is being calculated.

Usually this is not true. Weights are updated depending on which variant of gradient descent is used. In many cases this is batch gradient descent, so you will have weight updates every batch.

answered Dec 27 '18 at 23:03

ixeption

1,972
1
13
19

And if every epoch consists of exactly one batch (as in my case) that means there are only weight updates at the end of each epoch, right? – Schmax Dec 27 '18 at 23:39
Yes in this case it should be – ixeption Dec 27 '18 at 23:52

Keras: Best model according to ModelCheckpoint-callback produces different loss on training set than best epoch loss displayed while training

1 Answers1