1

I have some code, largely taken from various sources linked at the bottom of this post, written in Python, that takes an image of shape [height, width] and some bounding boxes in the [x_min, y_min, x_max, y_max] format, both numpy arrays, and rotates an image and its bounding boxes counterclockwise. Since after rotation the bounding box becomes more of a "diamond shape", i.e. not axis aligned, then I perform some calculations to make it axis aligned. The purpose of this code is to perform data augmentation in training an object detection neural network through the use of rotated data (where flipping horizontally or vertically is common). It seems flips of other angles are common for image classification, without bounding boxes, but when there is boxes, the resources for how to flip the boxes as well as the images is relatively sparse/niche.

It seems when I input an angle of 45 degrees, that I get some less than "tight" bounding boxes, as in the four corners are not a very good annotation, whereas the original one was close to perfect.

The image shown below is the first image in the MS COCO 2014 object detection dataset (training image), and its first bounding box annotation. My code is as follows:

import math
import cv2
import numpy as np

# angle assumed to be in degrees
# bbs a list of bounding boxes in x_min, y_min, x_max, y_max format
def rotateImageAndBoundingBoxes(im, bbs, angle):
    h, w = im.shape[0], im.shape[1]
    (cX, cY) = (w//2, h//2) # original image center
    M = cv2.getRotationMatrix2D((cX, cY), angle, 1.0) # 2 by 3 rotation matrix
    cos = np.abs(M[0, 0])
    sin = np.abs(M[0, 1])
    
    # compute the dimensions of the rotated image
    nW = int((h * sin) + (w * cos))
    nH = int((h * cos) + (w * sin))
    
    # adjust the rotation matrix to take into account translation of the new centre
    M[0, 2] += (nW / 2) - cX
    M[1, 2] += (nH / 2) - cY
    rotated_im = cv2.warpAffine(im, M, (nW, nH))

    rotated_bbs = []
    for bb in bbs:
        # get the four rotated corners of the bounding box
        vec1 = np.matmul(M, np.array([bb[0], bb[1], 1], dtype=np.float64)) # top left corner transformed
        vec2 = np.matmul(M, np.array([bb[2], bb[1], 1], dtype=np.float64)) # top right corner transformed
        vec3 = np.matmul(M, np.array([bb[0], bb[3], 1], dtype=np.float64)) # bottom left corner transformed
        vec4 = np.matmul(M, np.array([bb[2], bb[3], 1], dtype=np.float64)) # bottom right corner transformed
        x_vals = [vec1[0], vec2[0], vec3[0], vec4[0]]
        y_vals = [vec1[1], vec2[1], vec3[1], vec4[1]]
        x_min = math.ceil(np.min(x_vals))
        x_max = math.floor(np.max(x_vals))
        y_min = math.ceil(np.min(y_vals))
        y_max = math.floor(np.max(y_vals))
        bb = [x_min, y_min, x_max, y_max]
        rotated_bbs.append(bb)
    
    // my function to resize image and bbs to the original image size
    rotated_im, rotated_bbs = resizeImageAndBoxes(rotated_im, w, h, rotated_bbs) 
    
    return rotated_im, rotated_bbs

The good bounding box looks like: enter image description here

The not-so-good bounding box looks like :

enter image description here

I am trying to determine if this is an error of my code, or this is expected behavior? It seems like this problem is less apparent at integer multiples of pi/2 radians (90 degrees), but I would like to achieve tight bounding boxes at any angle of rotation. Any insights at all appreciated.

Sources: [Open CV2 documentation] https://docs.opencv.org/3.4/da/d54/group__imgproc__transform.html#gafbbc470ce83812914a70abfb604f4326

[Data Augmentation Discussion] https://blog.paperspace.com/data-augmentation-for-object-detection-rotation-and-shearing/

[Mathematics of rotation around an arbitrary point in 2 dimension] https://math.stackexchange.com/questions/2093314/rotation-matrix-of-rotation-around-a-point-other-than-the-origin

IntegrateThis
  • 853
  • 2
  • 16
  • 39
  • Why not use the same `cv2.warpAffine` to rotate the bounding boxes as well ? – Abhi25t Jan 14 '21 at 10:01
  • 3
    You are computing the bounding box of a rotated box. This will naturally be larger than the box. Rotating a box by 45 degrees makes the bounding box the largest. I welcome you to draw a box on a piece of paper, rotate the paper, then draw a box that completely enclosed the first box, and compare their sizes. – Cris Luengo Jan 14 '21 at 15:20
  • @CrisLuengo good point. So really the answer is that this is "expected behaviour". Any thoughts on how to get a better box? Alternative approaches? – IntegrateThis Jan 15 '21 at 00:16
  • What do you need the boxes for? Where do the boxes originally come from? Ideally you’d create the boxes for the rotated image, rather than adjust the boxes with the rotation. Maybe these are manual annotations? If so, try to make polygonal annotations they are tight around your object of interest, you’ll get tighter boxes if you fit boxes to the polygons after rotation. – Cris Luengo Jan 15 '21 at 00:29
  • @CrisLuengo well it seems that as part of some authors data augmentation schemes to train more robust networks, they randomly rotate the image. Now this clearly isn't a problem for increments of 45 degrees, but as my picture shows its not a great annotation for other rotation angles. The annotations, as stated in the post, come from the MS COCO 2014 object detection dataset, where manually re-annotating is clearly out of the question since there are 120K such images with an average of 6 annotations for image. I think I just need a bit more math here to try and get a tighter fit... – IntegrateThis Jan 15 '21 at 00:39
  • I should add the random rotations may only be used in image classification problems, where there are no bounding boxes. I think for now I might just settle with horizontal/vertical flips. If you want to post your original comment as an answer I will accept it. – IntegrateThis Jan 15 '21 at 00:41

1 Answers1

0

It seems for the most part this is expected behavior as per the comments. I do have a kind of hacky solution to this problem, where you can write a function like

# assuming box coords = [x_min, y_min, x_max, y_max]
def cropBoxByPercentage(box_coords, image_width, image_height, x_percentage=0.05, y_percentage=0.05):
    box_xmin = box_coords[0]
    box_ymin = box_coords[1]
    box_xmax = box_coords[2]
    box_ymax = box_coords[3]
    box_width = box_xmax-box_xmin+1
    box_height = box_ymax-box_ymin+1
    dx = int(x_percentage * box_width)
    dy = int(y_percentage * box_height)
    box_xmin = max(0, box_xmin-dx)
    box_xmax = min(image_width-1, box_xmax+dx)
    box_ymin = max(0, box_ymax - dy)
    box_ymax = min(image_height - 1, box_ymax + dy)
    return np.array([box_xmin, box_xmax, box_ymin, box_ymax])

Where computing the x_percentage and y_percentage can be computed using a fixed value, or could be computed using some heuristic.

IntegrateThis
  • 853
  • 2
  • 16
  • 39