Problem with gluoncv Object Detection "Train Faster-RCNN end-to-end on PASCAL VOC" example

Question

I'm new on gluon and I decided to run the examples to get familiar with the coding style (I used keras a couple years ago and this hybrid style is a little bit confusing to me).

My problem is that I can run the examples, but after successfully executing every cell on this example (it's a jupyter notebook) I upload an external image and the net seems to be incapable of detecting any object. I pasted the same cell on the 02. Predict with pre-trained Faster RCNN models and the pre-trained net had no problem detecting every person on the image, so it seems to me that the model in the example is not being trained correctly.

Has this happened to anyone else?

Am I missing something?

Thank you in advance!

(by the way, I have try uncommenting the 32th line of the training loop (the one with utograd.backward), changing the break-if limit on the same loop with no luck)

LINKS

I'm having this trouble while executing the original examples plus the cell bellow.

02) https://gluon-cv.mxnet.io/build/examples_detection/demo_faster_rcnn.html

06) https://gluon-cv.mxnet.io/build/examples_detection/train_faster_rcnn_voc.html

My test image

Cell to detect objects on the image

short, max_size = 600, 800 
RCNN_transform = presets.rcnn.FasterRCNNDefaultTrainTransform(short, max_size)

myImg = 'unnamed.jpg'
x, img = data.transforms.presets.rcnn.load_test(myImg)

box_ids, scores, bboxes = net(x)
ax = utils.viz.plot_bbox(img, bboxes[0], scores[0], box_ids[0], class_names=net.classes)

plt.show()

system info (if relevant)

I am using my personal computer and I also am using google colab, with the same results, but just in case...

OS: Ubuntu 18.04

hardware

$ hwinfo --short
cpu:                                                            
                   Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz, 2700 MHz
                   Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz, 2700 MHz
                   Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz, 2700 MHz
                   Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz, 2700 MHz
                   Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz, 2700 MHz
                   Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz, 2700 MHz
                   Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz, 2700 MHz
                   Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz, 2700 MHz

graphics card:
                   nVidia GM107M [GeForce GTX 960M]
                   Intel HD Graphics 530

NVidia driver

$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82       Driver Version: 440.82       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 960M    Off  | 00000000:02:00.0 Off |                  N/A |
| N/A   41C    P5    N/A /  N/A |    665MiB /  4046MiB |     23%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      2560      G   /usr/lib/xorg/Xorg                           308MiB |
|    0      2921      G   /usr/bin/gnome-shell                         132MiB |
|    0      3741      G   ...quest-channel-token=7390050445218241480    31MiB |
|    0      5455      G   ...AAAAAAAAAAAACAAAAAAAAAA= --shared-files   176MiB |
+-----------------------------------------------------------------------------+

CUDA

$nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89

MxNet and Gluon installed by

$ pip install mxnet-cu102mkl
$ pip install --upgrade mxnet-cu102mkl gluoncv

EDIT I have been making modifications to the training loop, this is what I have so far. The first block of lines right after the third for loop are just to have the data stored on the GPU.

#net.hybridize()
epochs = 50
for epoch in range(epochs):
    print("epoch: ", epoch,"---------------------------------")
    batch_size = 10
    for ib, batch in enumerate(train_loader):
        #print(ib)
        if ib > 500:
            break
        for dataa, label, rpn_cls_targets, rpn_box_targets, rpn_box_masks in zip(*batch):

            dataa = dataa.as_in_context(mx.gpu(0))
            label = label.as_in_context(mx.gpu(0)).expand_dims(0)
            rpn_cls_targets = rpn_cls_targets.as_in_context(mx.gpu(0))
            rpn_box_targets = rpn_box_targets.as_in_context(mx.gpu(0))
            rpn_box_masks = rpn_box_masks.as_in_context(mx.gpu(0))

            gt_label = label[:, :, 4:5]
            gt_box = label[:, :, :4]

            with autograd.record():
                # network forward
                cls_preds, box_preds, roi, samples, matches, rpn_score, rpn_box, anchors, cls_targets, box_targets, box_masks, _ = net(dataa.expand_dims(0), gt_box, gt_label)

                # losses of rpn
                rpn_score = rpn_score.squeeze(axis=-1)
                num_rpn_pos = (rpn_cls_targets >= 0).sum()
                rpn_loss1 = rpn_cls_loss(rpn_score, rpn_cls_targets,rpn_cls_targets >= 0) * rpn_cls_targets.size / num_rpn_pos
                rpn_loss2 = rpn_box_loss(rpn_box, rpn_box_targets,rpn_box_masks) * rpn_box.size / num_rpn_pos

                # losses of rcnn
                num_rcnn_pos = (cls_targets >= 0).sum()
                rcnn_loss1 = rcnn_cls_loss(cls_preds, cls_targets,cls_targets >= 0) * cls_targets.size / cls_targets.shape[0] / num_rcnn_pos
                rcnn_loss2 = rcnn_box_loss(box_preds, box_targets, box_masks) * box_preds.size / box_preds.shape[0] / num_rcnn_pos

            # some standard gluon training steps:
            autograd.backward([rpn_loss1, rpn_loss2, rcnn_loss1, rcnn_loss2])
            trainer.step(batch_size)

I have doubts about the trainer, I found this on other examples, but I'm not sure if this works in this context.

trainer = gluon.Trainer(net.collect_params(), 'sgd',{'learning_rate': 0.01, 'wd': 0.05, 'momentum': 0.9})

EDIT

here's a copy of the .ipynb file I've been working on (google-colab version) https://drive.google.com/file/d/1WevimDyTP1lvq_A0OBRMgC-PH8pK4iBv/view?usp=sharing

do you mean you are training the network yourself using the VOC tuto (6) or just doing inference using the pre-trained demo (2) ? what is the issue? — Olivier Cruchant, Apr 16 '20 at 22:06
@Olivier_Cruchant I am training the network myself using the VOC tuto (6), or that's my goal. The demo (2) part is just to confirm that the problem isn't the way I upload the image or feed the net, the problem is the net not being trained (as far as I get what's going on), which is strange for an example of how to train a net. My goal on posting this is to know if I'm messing something, if there is something I''m not getting (which I suspect is happening), or if there is, indeed, an error on this example. — JPabloFuenzalida, Apr 17 '20 at 02:52
makes sense thanks! during the training loop, can you see in the logs that the model mAP is improving? You can see in the model zoo that VOC-trained models typically reach mAPs in the range of 70-80% so unless if you have similar mAP you won't have a model as good as pre-trained models used in the inference demo. You can also edit the visualization function to lower confidence threshold for display and see if the model detects anything at all: `utils.viz.plot_bbox(img, bboxes[0], scores[0], box_ids[0], thresh=0.3, class_names=net.classes)` — Olivier Cruchant, Apr 17 '20 at 09:25
@Olivier_Cruchant All scores are between 0.046 and 0.048, before and after "training". I have made changes to the training loop, I'll edit the post — JPabloFuenzalida, Apr 17 '20 at 17:31

Problem with gluoncv Object Detection "Train Faster-RCNN end-to-end on PASCAL VOC" example

0 Answers0