Resume training faster-rccn model from the last step after the interrupted session on Google Colab

Question

Used faster-rcnn to train my model on Google Colab. However, due to the GPU time limit, my model stopped training at 7000 steps out of 10,000 steps.

I used the following code to train the model

!python model_main_tf2.py --model_dir=models/my_faster_rcnn_inception_resnet_v2 --pipeline_config_path=models/my_faster_rcnn_inception_resnet_v2/pipeline.config

How can I resume training my model from the last step (i.e. 7000)?

score 1 · Answer 1 · answered Mar 25 '21 at 21:55

I managed to figure it out after following this tutorial. So to resume the training, I changed the directory location for fine_tune_checkpoint from the pre-trained model directory (i.e. pretrained_mode/checkpoint/ckpt-0) to my model training directory location where model checkpoints are saved (i.e. /my_training_model/ckpt-#). Replace # with the number of the last ckpt e.g. ckpt-9

Then I rerun the following command to resume the training:

!python model_main_tf2.py --model_dir=models/my_faster_rcnn_inception_resnet_v2 --pipeline_config_path=models/my_faster_rcnn_inception_resnet_v2/pipeline.config

Resume training faster-rccn model from the last step after the interrupted session on Google Colab

1 Answers1