3

I'm using a faster rcnn model to run some object detection. The wrapper I'm using is gluon and the code is below:

net = model_zoo.get_model('faster_rcnn_resnet50_v1b_coco', pretrained=True)

im_fname = utils.download('https://github.com/dmlc/web-data/blob/master/' +
                      'gluoncv/detection/biking.jpg?raw=true',
                      path='biking.jpg')
x, orig_img = data.transforms.presets.rcnn.load_test(im_fname)

box_ids, scores, bboxes = net(x)

My question is, is it possible to reduce the size of the arrays returned by net(x), effectively making computations faster?

The issue is that the model produces box_ids, scores and bboxes as arrays with 80000 elements - only the first ~10 are useful, the rest have a score of -1. I later try to convert these arrays to numpy arrays using asnumpy(), however, mxnet uses an asyncronous engine and this function has to wait for the computations to end before it can be executed. The computations take longer (5secs +) for 80000 elements and hence I am trying to reduce the array size (SSD model outputs approx 6000 elements and is much faster).

If you have other solutions on how to make .asnumpy() faster these are welcome too - basically, one pass of an image take 5 seconds and this seems unreasonable so I'm looking for it to be reduced to ~0.2s (which seems more appropriate right?)

Thanks!

Dave
  • 454
  • 1
  • 7
  • 17
  • can you profile this code to see where the bottleneck is occuring? if possible just index the output to grab the top and save to disk with a compressed format like `bcolz` – rgalbo Sep 01 '19 at 16:19

1 Answers1

0

You can reduce the maxiumum number of detected objects, by changing the non-maximal suppression parameters. See post_nms and set_nms. When you reduce this you'll get less object padding (i.e. -1s), but you'll also potentially miss objects in images with a large number of objects.

net.set_nms(nms_thresh=0.5, nms_topk=50)

I don't think this will increase the overall throughput though, since the vast majority of computation is performed before NMS. I'd recommend you look at other architectures if low latency and high thoughput is required. yolo3_darknet53_coco 3 (608x608) isn't far of the mark compared with FasterRCNN in terms of mAP, but has substantially better throughput (~10x).

Thom Lane
  • 993
  • 9
  • 9