Resizing variable sized images for object detection

Question

I am using KITTI's object detection dataset to train a Faster R-CNN with ResNet 101, pretrained on MS COCO. KITTI images "mostly" of dimension 375x1242

When I had batch_size: 1, everything was perfect. My keep_aspect_ratio_resizer was like below, as it was proposed by TensorFlow itself.

min_dimension: 600
max_dimension: 1987

But now I want to use batch_size: 5, but I keep getting dimension mismatch errors. Because some of the images have a slightly smaller size, like 370x1224 etc.

I can't find a general keep_aspect_ratio_resizervalues. I tried the below values based one the values I saw in the error messages, but I can't make all images the same size really

min_dimension: 600
max_dimension: 1985

min_dimension: 599
max_dimension: 1985

have you tried `tf.image.crop_and_resize`? How is input pipeline constructed? — Sharky, Mar 14 '19 at 17:23
I'm training with legacy/train.py. I didn't change anything there, and also in the config file. I don't know where to put `tf.image.crop_and_resize` honestly... I tried putting `fixed_shape_resizer` before `keep_aspect_ratio_resizer` in the config file, however it didn't work — kneazle, Mar 14 '19 at 17:42
I used `tf.image.resize_image_with_crop_or_pad` instead. looked like a better option for my case. Thanks a lot for the tip! — kneazle, Mar 14 '19 at 20:19

score 0 · Answer 1 · answered Mar 14 '19 at 18:12

You could resizing the images yourself to get around the rounding error here or you could try to group items of the same size together into batches.

Assuming you have a generator of images, and that the images have a size attribute, you can create a generator that will yield batches of the images which are all the same size as follows:

from itertools import groupby
from collections import deque, defaultdict


def same_size_batches(images, batch_size=5):
    image_cache = defaultdict(deque)

    # We assume the image object has a size parameter we can group by
    for size, images in groupby(images, key=lambda image: image.size):
        for image in images:
            image_cache[size].append(image)

            # Everytime our batch gets big enough, yield it and reset
            if len(image_cache[size]) == batch_size:
                yield iter(image_cache[size])
                image_cache[size].clear()

The main part here is the group by that groups consecutive items by the same key and returns that key and a generator of items matching that key. In our case the key is the size of the image.

We then keep a cache of items of the same size, and every time one of the sizes hits our desired batch size, we yield out a generator for that batch.

We can demonstrate this working with a fake image object which has the required size parameter:

import random


class FakeImage(object):
    def __init__(self, _id):
        self.id = _id
        self.size = (370, 1224) if random.random() < 0.25 else (375, 1242)

    def __repr__(self):
        return "<Image {} {}>".format(self.id, self.size)


images = (FakeImage(_id) for _id in range(100))
for batch in same_size_batches(images, batch_size=5):
    print(list(batch))

This results in something like:

[<Image 0 (375, 1242)>, <Image 2 (375, 1242)>, <Image 3 (375, 1242)>, <Image 4 (375, 1242)>, <Image 6 (375, 1242)>]
[<Image 7 (375, 1242)>, <Image 8 (375, 1242)>, <Image 9 (375, 1242)>, <Image 10 (375, 1242)>, <Image 12 (375, 1242)>]
[<Image 1 (370, 1224)>, <Image 5 (370, 1224)>, <Image 11 (370, 1224)>, <Image 14 (370, 1224)>, <Image 16 (370, 1224)>]
[<Image 13 (375, 1242)>, <Image 15 (375, 1242)>, <Image 18 (375, 1242)>, <Image 19 (375, 1242)>, <Image 20 (375, 1242)>]
[<Image 21 (375, 1242)>, <Image 23 (375, 1242)>, <Image 24 (375, 1242)>, <Image 25 (375, 1242)>, <Image 26 (375, 1242)>]
...

We aren't guaranteed to produce all the images if we don't get enough to fill a block of the batch size, but if the input is an infinite generator this isn't an issue.

Thank you very much the detailed explanation. I tired another solution first and it worked. — kneazle, Mar 14 '19 at 20:18

score 0 · Answer 2 · answered Mar 14 '19 at 20:18

0

Fixed by adding tf.image.resize_image_with_crop_or_pad(images, max_height, max_width) to create_input_queue() in https://github.com/tensorflow/models/blob/master/research/object_detection/legacy/trainer.py

answered Mar 14 '19 at 20:18

kneazle

335
4
14

Resizing variable sized images for object detection

2 Answers2