1

I am facing a bit of a bizzare problem: I have a bunch of different sized images and I am trying to train+infer on these, and I have the following example transform code:

import augly.image as imaugs
import augly.utils as utils

        self.infer_transform = transforms.Compose([
            imaugs.PadSquare(p=1),
            transforms.Resize([384], interpolation=torchvision.transforms.InterpolationMode.BICUBIC),
            transforms.ToTensor(),
            transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])

When I use a batchsize >1 I get thrown this:

RuntimeError: stack expects each tensor to be equal size, but got [3, 384, 384] at entry 0 and [3, 385, 384] at entry 3

I find this really bizarre, since, after PadSquare, when I resize using a single int, it should give me a square image back - but it seems like it does not... why is this? is this a bug? It almost seems like some round-off error (got [3, 384, 384] at entry 0 and [3, 385, 384]).

Hoever, if I do this:

        self.infer_transform = transforms.Compose([
            imaugs.PadSquare(p=1),
            transforms.Resize((384,384), interpolation=torchvision.transforms.InterpolationMode.BICUBIC),
            transforms.ToTensor(),
            transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])

it works fine...

What is the reason behind this? I am perplexed! When I try out sample images in say colab, they seem to have the same size...

JohnJ
  • 6,736
  • 13
  • 49
  • 82

0 Answers0