Performance of MaskRCNN/YOLO as a function of object size in pixels

Question

I am trying to find references on how the resolution of an object affects the ability of object detection systems such as MaskRCNN and YOLO to correctly identify the object.

For example, if the camera is zoomed further and further out, the number of pixels making up the object will shrink, and eventually the object will occupy just a single pixel. At this point the algorithm can only use the values of that single pixel, and so it seems unlikely that even a very accurate algorithm will be able to make a detection. I'm hoping to find any sort of reference for how the performance degrades as the pixels per object are decreased.

score 1 · Accepted Answer · answered Jan 10 '20 at 09:54

First, I think that the experimental results in most object detection papers support your intuition that lower resolution results in lower detection accuracy/precision. For example, if you look at the AP-S, AP-M, AP-L (*i.e. average precision for small, medium and large objects) in the experimental results of object detection papers such as Yolov3 (table 3), you will notice a huge drop in AP-S compared to AP-M and AP-L, especially for one-shot methods.

Second, I think that a good starting point to get some experimental support for your claim is to use the coco dataset and to slightly modify the cocoeval scripts that come with the cocoapi (if I'm not mistaken, those that are in cocoapi/PythonAPI/pycocotools/). As the documentation states, the default values for small, medium and large objects is as follows:

APsmall
AP for small objects: area < 32^2
APmedium
AP for medium objects: 32^2 < area < 96^2
APlarge
AP for large objects: area > 96^2

You can start by looping on the small objects threshold by starting at 32^2 and decreasing it until you reach some minimum area, and look at how the AP-small score decreases as a function of this threshold. This is likely to result in a decreasing curve which will illustrate your point.

Performance of MaskRCNN/YOLO as a function of object size in pixels

1 Answers1