Some questions about the required 300x300 input of the quantized Mobilenet-SSD V2

Question

I want to retrain quantized Mobilenet-SSD V2 model so i downloaded the unlabeled folder from COCO. This model requires input size of 300x300 but i succeeded retrainig it once on pictures of a different size and it worked (poorly, but worked). Also, the code that uses the retrained model resizes the input from the camera to 500x500 and it works. So my question is, why is it written that the required input is 300x300 if it works with other sizes too? Do I need to resize all the dataset to 300x300 before I label them? I know it does convolution on the input so i don't think the size really matters (fix me if im wrong). As I know, the convolution occoure until we reach the end of the input.

Thanks for helping!

score 3 · Answer 1 · answered Apr 13 '20 at 09:25

If I understand correctly you are using TF Object Detection API. A given model, as mobilenet-v2-ssd, contains 3 main blocks: [prepeocessing (normalizing and resizing] --> [Detector (backbone + detection heads)] --> [Postprocessing(bbox decoding+nms)]

When they talk about required input, it is for the detector.. The checkpoint itself contain the full pipeline, which means that the preprocessing unit will do the work for you - so there is no need to resize it to 300x300 beforehand.

if for some reason you intend to inject the input by yourself directly to the detector you have do the same preprocessing what was done in the training.

BTW: in the config file of the training (https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/ssd_mobilenet_v2_coco.config) you can see the resize that was defined: image_resizer { fixed_shape_resizer { height: 300 width: 300 } } - the normalization is mobilenet normalization (changing the dynamic range of the input from [0,255] to [-1,1]

Thanks for answering! After i looked at the files in the folder i did see that the pictures are being resized to 300x300 and normalized (didn't noticed it before i posted this thread). I think i understood why it has to be 300x300 eventhough convolution doesn't really limited to input size and it's because the output size is dependant on the input size and in the fully connected part it won't fit (fix me if im wrong). what i didn't understand is why when i resize it to something defferent from 300x300 it still works (not in the training part). — Gal Elias, Apr 13 '20 at 17:36
1. Are you sure you insert different input size to the network itself? the input tensor of TF OD API is before preprocessing, which means that whatever shape you will insert it will still be inserted to the network as 300x300. 2. Insertion of different shapes may work, because the network is fully convolutional, yet you should expect degradation. I've tried to run the trained mobilenet-v1-ssd on images wi — Tamir Tapuhi, Apr 14 '20 at 06:45

Some questions about the required 300x300 input of the quantized Mobilenet-SSD V2

1 Answers1