How do I perform data augmentation in object localization

Question

Performing data augmentation for classification task is easy as most transform do not change the ground truth label of the image.

However in the case of object localization:

The position of the bounding box is relative to the crop that has been taken.
There can be the case that the bounding box is only partially in the crop window, do we perform some sort of clipping in this case.
There will also be the case that the object bounding box are not included in the crop, do we discard these examples during training.

I am unable to understand how such cases are handled in object localization. Most papers suggest the use of Multi-Scale training but dont address these issues.

I found this helpful https://stackoverflow.com/questions/47402896/tf-image-sample-distorted-bounding-box-valueerror — bicepjai, Jan 30 '18 at 01:13

score 2 · Answer 1 · answered May 14 '18 at 03:34

The augmentation methods have to alter the content of the bounding box. In the case of Color augmentations, the pixel distribution would be changed and the coordinates of the bounding box would not change. But in the case of geometric augmentations such as cropping or scaling, not only the pixel distribution would be affected but also the coordinates of the bounding box. Those changes should be kept in the annotation files so the algorithm can read it.

Custom scripts are common to solve this problem. However, In my repository I have a library that would help you. Here is the link https://github.com/lozuwa/impy . With this library you can perform the operations I described previously.

How do I perform data augmentation in object localization

1 Answers1