I'm following a tutorial here for implementing a Faster RCNN against a custom dataset using PyTorch.
This is my training loop:
for images, targets in metric_logger.log_every(data_loader, print_freq, header):
# FOR GPU
images = list(image.to(device) for image in images)
targets = [{k: v.to(device) for k, v in t.items()} for t in targets]
# Train the model
loss_dict = model(images, targets)
# reduce losses over all GPUs for logging purposes
losses = sum(loss for loss in loss_dict.values())
loss_dict_reduced = reduce_dict(loss_dict)
losses_reduced = sum(loss for loss in loss_dict_reduced.values())
loss_value = losses_reduced.item()
The metric logger (defined here) outputs the following to the console during training:
Epoch: [0] [ 0/226] eta: 0:07:57 lr: 0.000027 loss: 6.5019 (6.5019) loss_classifier: 0.8038 (0.8038) loss_box_reg: 0.1398 (0.1398) loss_objectness: 5.2717 (5.2717) loss_rpn_box_reg: 0.2866 (0.2866) time: 2.1142 data: 0.1003 max mem: 3827
Epoch: [0] [ 30/226] eta: 0:02:28 lr: 0.000693 loss: 1.3016 (2.4401) loss_classifier: 0.2914 (0.4067) loss_box_reg: 0.2294 (0.2191) loss_objectness: 0.3558 (1.2913) loss_rpn_box_reg: 0.3749 (0.5230) time: 0.7128 data: 0.0923 max mem: 4341
After an epoch has finished, I call an evaluate method which outputs the following:
Test: [ 0/100] eta: 0:00:25 model_time: 0.0880 (0.0880) evaluator_time: 0.1400 (0.1400) time: 0.2510 data: 0.0200 max mem: 4703
Test: [ 99/100] eta: 0:00:00 model_time: 0.0790 (0.0786) evaluator_time: 0.0110 (0.0382) time: 0.1528 data: 0.0221 max mem: 4703
Test: Total time: 0:00:14 (0.1401 s / it)
Averaged stats: model_time: 0.0790 (0.0786) evaluator_time: 0.0110 (0.0382)
Accumulating evaluation results...
DONE (t=0.11s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.263
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.346
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.304
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.208
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.308
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.013
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.027
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.175
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.311
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.264
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.351
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.086
I'm a bit confused by the differing metrics used during training and testing - I had wanted to plot training + validation loss (or the equivalent IoU values) so I can visualise training and testing performance, as well as checking if any overfitting is occurring.
My question is, how can I compare the model's training and testing performance?