2

For the task that involves regression, I need to train my models to generate density maps from RGB images. To augment my dataset I have decided to flip all the images horizontally. For that matter, I also have to flip my ground truth images and I did so.

dataset_for_augmentation.listDataset(train_list,
                        shuffle=True,
                        transform=transforms.Compose([
                            transforms.RandomHorizontalFlip(p=1),
                            transforms.ToTensor(),
                            transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
                        ]),
                        target_transform=transforms.Compose([
                            transforms.RandomHorizontalFlip(p=1),
                            transforms.ToTensor()
                        ]),
                        train=True,
                        resize=4,
                        batch_size=args.batch_size,
                        num_workers=args.workers),

But here is the problem : For some reason, PyTorch transforms.RandomHorizontalFlip function takes only PIL images (numpy is not allowed) as input. So I decided to convert the type to PIL Image.

img_path = self.lines[index]

img, target = load_data(img_path, self.train, resize=self.resize)

if type(target[0][0]) is np.float64:
    target = np.float32(target)

img = Image.fromarray(img)
target = Image.fromarray(target)

if self.transform is not None:
    img = self.transform(img)
    target = self.target_transform(target)

return img, target

And yes, this operation need enormous amount of time. Considering I need this operation to be carried out for thousands of images, 23 seconds (should have been under half a second at most) per batch is not tolerable.

2019-11-01 16:29:02,497 - INFO - Epoch: [0][0/152]  Time 27.095 (27.095)    Data 23.150 (23.150)    Loss 93.7401 (93.7401)

I would appreciate any suggestions to speed up my augmentation process

Bedir Yilmaz
  • 3,823
  • 5
  • 34
  • 54

2 Answers2

3

You don't need to change the DataLoader to do that. You can use ToPILImage():

transform=transforms.Compose([
    transforms.ToPILImage(),  # check mode assumption in the documentation
    transforms.RandomHorizontalFlip(p=1),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

Anyway, I would avoid converting to PIL. It seems completely unnecessary. If you want to flip all images, then why not to do that using NumPy only?

img_path = self.lines[index]

img, target = load_data(img_path, self.train, resize=self.resize)

if type(target[0][0]) is np.float64:
    target = np.float32(target)

# assuming width axis=1 -- see my comment below
img = np.flip(img, axis=1)
target = np.flip(target, axis=1)

if self.transform is not None:
    img = self.transform(img)
    target = self.target_transform(target)

return img, target

And remove the transforms.RandomHorizontalFlip(p=1) from the Compose. As ToTensor(...) also handles ndarray, you are good to go.

Note: I am assuming the width axis is equal to 1, since ToTensor expects it to be there.

From the docs:

Converts a PIL Image or numpy.ndarray (H x W x C) ...

Berriel
  • 12,659
  • 4
  • 43
  • 67
  • Hi, my intention to use transform is to perform data augmentation. In other words, I need both the flipped and unflipped images at the same time, since I want to double the size of my dataset. But now I see that did not work out the way I tried. – Bedir Yilmaz Nov 02 '19 at 05:37
  • ToPILImage would not work in my case since my target is a 2-dim image, it does not match with the H x W x C structure. – Bedir Yilmaz Nov 02 '19 at 05:38
  • @3yanlis1bos I don't understand. At step `t`, do you want the dataloader to provide both flipped and original images (both `img` and `target`)? Let me know. – Berriel Nov 02 '19 at 13:10
  • @3yanlis1bos you probably want `img` and `target` at timestep `t`. It's how augmentation is usually done, network sees original example once, transformed case the second time so it does not overfit. Are you sure you want both versions at the same time not provided sequentially as is usually done? – Szymon Maszke Nov 02 '19 at 16:56
  • I guess I have caused bit of a miscommunication. Here is what I really want. I want to include the original samples and the transformed samples in my training set, apart from each other. I certainly would prefer to have them separated, even in different batches if possible. This is why I think it would be the best if I use torchdata. – Bedir Yilmaz Nov 02 '19 at 23:55
  • But since I use PyTorch 0.4.1, I am going to have to do it the old way, I will repeat my dataset and transform the half of it with using a probability of 0.5. This way I will be able to include both originals and the transformed ones. – Bedir Yilmaz Nov 03 '19 at 00:24
  • It's kinda embarrasing that it took me some time to realize this, but now I understand that transforming the half of the images randomly is more or less the same thing as duplicating my dataset with transformed samples. Over the numerous epochs, my model will be seen all the transformed and non-transformed examples anyway – Bedir Yilmaz Nov 03 '19 at 03:53
  • 1
    @3yanlis1bos exactly :) and you get the benefit of not having to store 2 * your database. – Berriel Nov 03 '19 at 12:12
1

More of an addition to @Berriel answer.

Horizontal Flip

You are using transforms.RandomHorizontalFlip(p=1) for both X and y images. In your case, with p=1, those will be transformed exactly the same but you are missing the point of data augmentation as the network will only see flipped images (instead of only original images). You should go for probability lower than 1 and higher than 0 (usually 0.5) to get high variability in versions of the image.

If that was the case (p=0.5), you can be more than certain that there will occur a situation, where X gets flipped and y doesn't.

I would advise to use albumentations library and it's albumentations.augmentations.transforms.HorizontalFlip to do the flip on both images the same way.

Normalization

You can find normalization with ImageNet means and stds already set up there as well.

Caching

Furthermore, to speed things up you could use torchdata third party library (disclaimer I'm the author). In your case you could transform image from PIL to Tensor, Normalize with albumentations, cache on disk or even better in RAM images after those transformations with torchdata and finally apply your transformations. This way would allow you to only apply HorizontalFlips on your image and target after initial epoch, previous steps would be pre-calculated.

Szymon Maszke
  • 22,747
  • 4
  • 43
  • 83
  • Thank you, precalculating the augmentation steps was my initial plan, but then I got lost in this particular step. I will definitely try to implement your suggestions. – Bedir Yilmaz Nov 02 '19 at 02:52
  • Btw, my assumption that setting p to 1 in RandomHorizontalFlip was not true? I would like to understand why x and y will be transformed differently in this case. – Bedir Yilmaz Nov 02 '19 at 02:53
  • 1
    @3yanlis1bos updated my answer about flipping. Yes, you would get exact same transformation, but you are not augmenting dataset this way as __all images__ would get flipped. Augmentation usually enhances size of the dataset and it's variability, here it would remain exactly the same. – Szymon Maszke Nov 02 '19 at 09:48
  • 1
    Yes, this turned out to be my mistake. I was hoping to enlarge my dataset with those transformations! :d Turns out what I am asking is not my main problem. This makes your answer even more valueable thank you. – Bedir Yilmaz Nov 02 '19 at 16:46