How to compare training and test performance in a Faster RCNN object detection model

Question

I'm following a tutorial here for implementing a Faster RCNN against a custom dataset using PyTorch.

This is my training loop:

for images, targets in metric_logger.log_every(data_loader, print_freq, header):
    # FOR GPU
    images = list(image.to(device) for image in images)
    targets = [{k: v.to(device) for k, v in t.items()} for t in targets]

    # Train the model
    loss_dict = model(images, targets)

    # reduce losses over all GPUs for logging purposes
    losses = sum(loss for loss in loss_dict.values())
    loss_dict_reduced = reduce_dict(loss_dict)
    losses_reduced = sum(loss for loss in loss_dict_reduced.values())
    loss_value = losses_reduced.item()

The metric logger (defined here) outputs the following to the console during training:

Epoch: [0]  [  0/226]  eta: 0:07:57  lr: 0.000027  loss: 6.5019 (6.5019)  loss_classifier: 0.8038 (0.8038)  loss_box_reg: 0.1398 (0.1398)  loss_objectness: 5.2717 (5.2717)  loss_rpn_box_reg: 0.2866 (0.2866)  time: 2.1142  data: 0.1003  max mem: 3827
Epoch: [0]  [ 30/226]  eta: 0:02:28  lr: 0.000693  loss: 1.3016 (2.4401)  loss_classifier: 0.2914 (0.4067)  loss_box_reg: 0.2294 (0.2191)  loss_objectness: 0.3558 (1.2913)  loss_rpn_box_reg: 0.3749 (0.5230)  time: 0.7128  data: 0.0923  max mem: 4341

After an epoch has finished, I call an evaluate method which outputs the following:

Test:  [  0/100]  eta: 0:00:25  model_time: 0.0880 (0.0880)  evaluator_time: 0.1400 (0.1400)  time: 0.2510  data: 0.0200  max mem: 4703
Test:  [ 99/100]  eta: 0:00:00  model_time: 0.0790 (0.0786)  evaluator_time: 0.0110 (0.0382)  time: 0.1528  data: 0.0221  max mem: 4703
Test: Total time: 0:00:14 (0.1401 s / it)
Averaged stats: model_time: 0.0790 (0.0786)  evaluator_time: 0.0110 (0.0382)
Accumulating evaluation results...
DONE (t=0.11s).
IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.263
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.346
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.304
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.208
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.308
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.013
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.027
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.175
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.311
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.264
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.351
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.086

I'm a bit confused by the differing metrics used during training and testing - I had wanted to plot training + validation loss (or the equivalent IoU values) so I can visualise training and testing performance, as well as checking if any overfitting is occurring.

My question is, how can I compare the model's training and testing performance?

score 2 · Accepted Answer · answered Jul 08 '21 at 04:47

The evaluate() function here doesn't calculate any loss. And look at how the loss is calculate in train_one_epoch() here, you actually need model to be in train mode. And make it like the train_one_epoch() except without updating the weight, like

@torch.no_grad()
def evaluate_loss(model, data_loader, device):
    model.train()
    metric_logger = utils.MetricLogger(delimiter="  ")
    header = 'Test:'
    for images, targets in metric_logger.log_every(data_loader, 100, header):
        images = list(image.to(device) for image in images)
        targets = [{k: v.to(device) for k, v in t.items()} for t in targets]

        loss_dict = model(images, targets)

        losses = sum(loss for loss in loss_dict.values())

        # reduce losses over all GPUs for logging purposes
        loss_dict_reduced = utils.reduce_dict(loss_dict)
        losses_reduced = sum(loss for loss in loss_dict_reduced.values())

        metric_logger.update(loss=losses_reduced, **loss_dict_reduced)

But since you need the model to be in eval mode to get bounding boxes. If you need mAP you'll need a loops of the original code too.

Thanks for the suggestion! I did try calling evaluate on the training data after an epoch, it has quite a huge hit on performance so it's not ideal. If I call `test_loss_dict = model(test_images, test_targets)` after an epoch, will that have any affect on the model? — TomSelleck, Jul 08 '21 at 10:26
No, the model will only change after calling `optimizer.step()` like [this](https://github.com/pytorch/vision/blob/master/references/detection/engine.py#L46). And with `@torch.no_grad()` you can't even call `losses.backward()`. — Natthaphon Hongcharoen, Jul 08 '21 at 10:40

How to compare training and test performance in a Faster RCNN object detection model

1 Answers1