I am training a model on Faster R CNN architecture. For the first session I used the below config:
def get_train_cfg(config_file_path, checkpoint_url, train_dataset_name, test_dataset_name, num_classes, device, output_dir):
cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file(config_file_path))
cfg.MODEL_WEIGHTS = model_zoo.get_checkpoint_url(checkpoint_url)
cfg.DATASETS.TRAIN = (train_dataset_name,)
cfg.DATASETS.TEST = (train_dataset_name,)
cfg.DATALOADER.NUM_WORKERS = 2
cfg.SOLVER.IMS_PER_BATCH = 1
cfg.SOLVER.BASE_LR = 0.0001
cfg.SOLVER.MAX_ITER = 16000
cfg.SOLVER.STEPS = []
cfg.MODEL.ROI_HEADS.NUM_CLASSES = num_classes
cfg.MODEL.DEVICE = device
cfg.OUTPUT_DIR = output_dir
return cfg
I want to continue my training. I have last_checkpoint
, metrics.json
, cfg.pickle
, and model_final.pth
.
This is the link of my Notebook
The training should start at 16001 iterations and the total loss at 16000 iterations was about 0.8. But the learning rate didn't change from 0.0001 from 0 to 16000 iterations. When I continue training by resume_or_load(resume=True) The below error is shown
[12/04 05:54:36 d2.data.datasets.coco]: Loaded 381 images in COCO format from ../input/cascade-rcnn/train.json
[12/04 05:54:36 d2.data.build]: Removed 1 images with no usable annotations. 380 images left.
[12/04 05:54:36 d2.data.dataset_mapper]: [DatasetMapper] Augmentations used in training: [ResizeShortestEdge(short_edge_length=(640, 672, 704, 736, 768, 800), max_size=1333, sample_style='choice'), RandomFlip()]
[12/04 05:54:36 d2.data.build]: Using training sampler TrainingSampler
[12/04 05:54:36 d2.data.common]: Serializing 380 elements to byte tensors and concatenating them all ...
[12/04 05:54:36 d2.data.common]: Serialized dataset takes 2.23 MiB
[12/04 05:54:37 d2.engine.hooks]: Loading scheduler from state_dict ...
[12/04 05:54:37 d2.engine.train_loop]: Starting training from iteration 16000
[12/04 05:54:37 d2.engine.hooks]: Total training time: 0:00:00 (0:00:00 on hooks)
[12/04 05:54:37 d2.data.datasets.coco]: Loaded 381 images in COCO format from ../input/cascade-rcnn/train.json
[12/04 05:54:38 d2.data.dataset_mapper]: [DatasetMapper] Augmentations used in inference: [ResizeShortestEdge(short_edge_length=(800, 800), max_size=1333, sample_style='choice')]
[12/04 05:54:38 d2.data.common]: Serializing 381 elements to byte tensors and concatenating them all ...
[12/04 05:54:38 d2.data.common]: Serialized dataset takes 2.23 MiB
WARNING [12/04 05:54:38 d2.engine.defaults]: No evaluator found. Use DefaultTrainer.test(evaluators=), or implement its build_evaluator method.
[12/04 05:54:38 d2.utils.events]: iter: 16001 lr: N/A max_mem: 1627M
It shows
lr: N/A
Why is that?
I am using,
- Python: 3.7.10,
- Detectron2 : 0.6,
- Torch : 1.9.1cu101