Input image size for tensorflow faster-rcnn in prediction mode?

Question

I have just done some transfer learning with a faster-rcnn using tensorflow object detection api. I am in tensorflow 1.14, the backbone network is faster_rcnn_resnet101_coco. Do frozen networks resize images fed to them when making predictions?

I ask because when I feed the model an image that is much larger than those I trained on, it doesn't recognize any of the objects. When I crop the image down to 1200x1200, the objects are all identical, but it works great.

Does the model include image size constraints? Should I be making predictions using similar dimensions to those in the config file, even though the objects are the same size in the 3000x3000 image?

In the config file for training, I constrain the input images:

image_resizer {
  keep_aspect_ratio_resizer {
    min_dimension: 200
    max_dimension: 1200
  }
}

Does this mean that in the trained model, that I am now using, if I feed it an image larger than 1200x1200, it will scale it down? Here is how I do the prediction in the loaded frozen model:

with model.as_default():
        with tf.Session(graph=model) as sess:
            imageTensor = model.get_tensor_by_name("image_tensor:0")
            boxesTensor = model.get_tensor_by_name("detection_boxes:0")
            scoresTensor = model.get_tensor_by_name("detection_scores:0")
            classesTensor = model.get_tensor_by_name("detection_classes:0")
            numDetections = model.get_tensor_by_name("num_detections:0")

            # Make prediction
            (boxes, scores, labels, N) = sess.run([boxesTensor, scoresTensor, classesTensor, numDetections],
                                                   feed_dict = {imageTensor: image})

Related: Training Image Size Faster-RCNN

Also, this post makes me think it should handle any input size, but it clearly doesn't handle them the same, so I'm confused: Faster RCNN + inception v2 input size

Abraham China · Answer 1 · 2019-10-29T03:30:28.213

In Feature Pyramid Networks for Object Detection, Faster RCNN shows different mAP on object of different size. The model has higher mAP on large objects than on small objects. In Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, faster RCNN resizes input images such that their shorter side is 600 pixels. Thus, in my option, relative size of objects in images does matter in detection. Cropping a large image and use the smaller image as input may facilitate the detection of small objects in the raw image for small objects become relatively large objects in the new image.

FPN in a basic Faster R-CNN system has different performance on small, middle and large objects.

score 1 · Answer 2 · answered Sep 16 '22 at 06:07

There are some default parameters in constructor of FasterRCNN, which means any large image will be resized to 1333 * 1333.

    def __init__(
        self,
        backbone,
        num_classes=None,
        # transform parameters
        min_size=800,
        max_size=1333,

Input image size for tensorflow faster-rcnn in prediction mode?

2 Answers2