7

In Pytorch, I know that certain image processing transformations can be composed as such:

import torchvision.transforms as transforms
transform = transforms.Compose([transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

In my case, each image has a corresponding annotation of bounding box coordinates with YOLO format. Does Pytorch allow to apply these transformations to the bounding box coordinates of the image as well, and later save them as new annotations? Thanks.

Karen
  • 401
  • 7
  • 15

1 Answers1

6

The transformations that you used as examples do not change the bounding box coordinates. ToTensor() converts a PIL image to a torch tensor and Normalize() is used to normalize the channels of the image.

Transformations such as RandomCrop() and RandomRotation() will cause a mismatch between the location of the bounding box and the (modified) image.

However, Pytorch makes it very flexible for you to create your own transformations and have control over what happens with the bounding box coordinates.

Docs for more details: https://pytorch.org/docs/stable/torchvision/transforms.html#functional-transforms

As an example (modified from the documentation):

import torchvision.transforms.functional as TF
import random

def my_rotation(image, bonding_box_coordinate):
    if random.random() > 0.5:
        angle = random.randint(-30, 30)
        image = TF.rotate(image, angle)
        bonding_box_coordinate = TF.rotate(bonding_box_coordinate, angle)
    # more transforms ...
    return image, bonding_box_coordinate

Hope that helps =)

Victor Zuanazzi
  • 1,838
  • 1
  • 13
  • 29
  • Thanks for your answer Victor, I see the benefits of the code you posted. Could you specify what kind of object is "bonding_box_coordinate", as usually most annotation are stored in .txt or .xml formats? In the original documentation, it was not clear what kind of object segmentation is. Thanks. – Karen Jul 24 '20 at 07:28
  • I just assumed all inputs are torch tensors. You can convert the input to torch tensors in the dataloader or create a transform that does that for you. – Victor Zuanazzi Jul 24 '20 at 07:35
  • There is an YOLO3 implementation in Pytorch you can check for inspiration: https://github.com/eriklindernoren/PyTorch-YOLOv3/blob/47b7c912877ca69db35b8af3a38d6522681b3bb3/utils/datasets.py#L130 – Victor Zuanazzi Jul 24 '20 at 07:47
  • I really don't see how you would use the input data as a tensor as e.g. [0.12 0.45 0.48 0.34]. When using this with TF.rotate it will not work - given this is not an image. – Guenter Jan 30 '23 at 13:58