This is regarding the EarlyStopping class in Keras. I am trying to create a scenario, where I can resume my training, exactly from the point it stopped, should the kernel crash. When resuming , I just have to set the resume_training flag to true. This causes the csv_logger to append to an existing trainlog.csv file. It also loads the last weight file before the kernel crash.
resume_training=True
checkpoint=ModelCheckpoint(filepath=checkpath+'/weights/weights.{epoch:02d}-{val_loss:.2f}.h5',
save_weights_only=True,verbose=1)
csv_logger=CSVLogger(os.path.join(checkpath,'logger/trainlog.csv'),
separator=',',
append=resume_training)
early_stopper=EarlyStopping(monitor='accuracy',min_delta=0.0.001,patience=12)
lr_scheduler = LearningRateScheduler(scheduler)
callbacks_list=[checkpoint,early_stopper,csv_logger,lr_scheduler]
optimiser = optimizers.Adam()
model.compile(optimizer=optimiser, loss='binary_crossentropy', metrics=['accuracy'])
batch_size=16
if resume_training:
model.load_weights('last_weight_file.h5')
#some logic to read initial epoch from csv_logger
Here I also have a logic to set the initial epoch to the last valid value from the csv_logger.
model.fit(datagen.flow(xtrain, ytrain, batch_size=batch_size),
steps_per_epoch=xtrain.shape[0]//batch_size,
epochs=100,
verbose=1,
validation_data=(xval, yval),callbacks=callbacks_list,
initial_epoch=initial_epoch)
My question is regarding the patience parameter in EarlyStopping.
- Does it count the number of epochs from the last best metric (even though it might have been before the kernel crash ) or from the point of training restart. In case of the latter, even if I set the baseline value , how can I ensure that the number of epochs is counted from the last best epoch.
- The EarlyStopping class also has an argument restore_best_weights. If the best weight occurred before the kernel crashed. On resuming the fit method with last available weight file and epoch initialization, does the early stopping mechanism revert to the best weight before crash.