I am trying to train a very simple Keras model with TensorFlow back end in Python.
I am aware that the epoch losses displayed in the console while training are calculated 'on the fly' for efficiency and therefore are not necessarily real losses of intermediate models. But to my understanding, they actually should be if each epoch consisted of just one batch that is the whole training set. The reason for that expectation is that in that case the model's weights are only updated once at the end of each epoch which means that the model does not change while an epoch's loss is being calculated.
Unfortunately, even if I set the batch size to the size of the training set, the best epoch's loss differs from the loss of the model that is best according to the ModelCheckpoint-callback.
Can someone explain that behavior to me? Does the ModelCheckpoint-callback maybe only calculate losses of the intermediate models some sort of 'on the fly' too?
Here's my code in which bestEpochLoss
and bestModelLoss
are never the same:
import numpy
import keras
#Create train data
trainInput = numpy.array([4,3,1,0,2])
trainOutput = numpy.array([0,2,2,0,1])
#Create and train model
model = keras.Sequential([
keras.layers.Dense(200, input_shape=(1,), activation='tanh'),
keras.layers.Dense(1, activation='linear')
])
model.compile(loss='mean_squared_error', optimizer=keras.optimizers.Adam(lr=0.1))
callbacks = [keras.callbacks.ModelCheckpoint(filepath='model.hdf5', monitor='loss', verbose=1, save_best_only=True)]
history = model.fit(trainInput, trainOutput, callbacks=callbacks, epochs=20, batch_size=len(trainInput))
#Evaluate best training epoch's loss vs best model's loss
bestEpochLoss = numpy.min(history.history['loss'])
bestModel = keras.models.load_model('model.hdf5')
bestModelLoss = bestModel.evaluate(trainInput, trainOutput)
print('Best training epoch\'s loss: ' + str(bestEpochLoss))
print('Best model\'s loss: ' + str(bestModelLoss))