You could resizing the images yourself to get around the rounding error here or you could try to group items of the same size together into batches.
Assuming you have a generator of images, and that the images have a size attribute, you can create a generator that will yield batches of the images which are all the same size as follows:
from itertools import groupby
from collections import deque, defaultdict
def same_size_batches(images, batch_size=5):
image_cache = defaultdict(deque)
# We assume the image object has a size parameter we can group by
for size, images in groupby(images, key=lambda image: image.size):
for image in images:
image_cache[size].append(image)
# Everytime our batch gets big enough, yield it and reset
if len(image_cache[size]) == batch_size:
yield iter(image_cache[size])
image_cache[size].clear()
The main part here is the group by that groups consecutive items by the same key and returns that key and a generator of items matching that key. In our case the key is the size of the image.
We then keep a cache of items of the same size, and every time one of the sizes hits our desired batch size, we yield out a generator for that batch.
We can demonstrate this working with a fake image object which has the required size parameter:
import random
class FakeImage(object):
def __init__(self, _id):
self.id = _id
self.size = (370, 1224) if random.random() < 0.25 else (375, 1242)
def __repr__(self):
return "<Image {} {}>".format(self.id, self.size)
images = (FakeImage(_id) for _id in range(100))
for batch in same_size_batches(images, batch_size=5):
print(list(batch))
This results in something like:
[<Image 0 (375, 1242)>, <Image 2 (375, 1242)>, <Image 3 (375, 1242)>, <Image 4 (375, 1242)>, <Image 6 (375, 1242)>]
[<Image 7 (375, 1242)>, <Image 8 (375, 1242)>, <Image 9 (375, 1242)>, <Image 10 (375, 1242)>, <Image 12 (375, 1242)>]
[<Image 1 (370, 1224)>, <Image 5 (370, 1224)>, <Image 11 (370, 1224)>, <Image 14 (370, 1224)>, <Image 16 (370, 1224)>]
[<Image 13 (375, 1242)>, <Image 15 (375, 1242)>, <Image 18 (375, 1242)>, <Image 19 (375, 1242)>, <Image 20 (375, 1242)>]
[<Image 21 (375, 1242)>, <Image 23 (375, 1242)>, <Image 24 (375, 1242)>, <Image 25 (375, 1242)>, <Image 26 (375, 1242)>]
...
We aren't guaranteed to produce all the images if we don't get enough to fill a block of the batch size, but if the input is an infinite generator this isn't an issue.