Crop image using pred boxes coordinates

Question

I used detectron2 to get predictions of where an object is located in an image. Now I'm trying to use the prediction boxes to crop the image (in my use case there is only 1 object/box detected per image). The part of my code that's relevant to my question is below. The issue is it's only cropping the left side of the image but I need it to (obviously) crop the top, right and bottom too so it crops to the shape of the detected object. The original images are of the shape (x, y, 3) so they are RGB images. What am I missing?

from detectron2.utils.visualizer import ColorMode
import glob

imageName = "my_img.jpg"
im = cv2.imread(imageName)
outputs = predictor(im)
v = Visualizer(im[:, :, ::-1], metadata=test_metadata, scale=0.8)
out = v.draw_instance_predictions(outputs["instances"].to("cpu"))
cv2_imshow(out.get_image()[:, :, ::-1])

boxes = outputs["instances"].pred_boxes
boxes = list(boxes)[0].detach().cpu().numpy()

# extract the bounding box coordinates
(x, y) = (int(boxes[0]), int(boxes[1]))
(w, h) = (int(boxes[2]), int(boxes[3]))
crop_img = image[x:y+h, y:x+w]
cv2_imshow(crop_img)

I also tried the following but it trimmed too much of the image from the top and didn't trim the right or bottom of the image at all.

from detectron2.data.transforms import CropTransform

ct = CropTransform(x, y, w, h)
crop_img = ct.apply_image(image)
cv2_imshow(crop_img)

Playing around with it, I was able to crop the image around the detected box with the following but it's not ideal since I had to hardcode it.

crop_img = image[y-40:y+h-390, x:x+w-395]

@TimRoberts that actually crops more than the box on the left and right. And doesn't crop the top/bottom of the image at all. — eTothEipiPlus1, Jun 16 '21 at 20:09
Of course. Silly me. The arrays produced by OpenCV are Y-major. `crop_img = image[y:y+h, x:x+w]`. — Tim Roberts, Jun 16 '21 at 20:10
@TimRoberts I'm still missing something because `image[y:y+h, x:x+w]` cropped the left side correctly, the top too much (it actually cropped under the box) and didn't crop the right or bottom at all. Any ideas? — eTothEipiPlus1, Jun 16 '21 at 20:13
@TimRoberts I was playing around with it and the following gets me pretty close to what I want, the cropped image around the box that was detected: `crop_img = image[y-40:y+h-390, x:x+w-395]`. But I want to be able to do this without all the hardcoded values. I'm not sure what I'm missing but if you have any thoughts, I'd love to hear them. — eTothEipiPlus1, Jun 16 '21 at 20:28
Are you absolutely sure that `boxes` has `(x, y, w, h)`, and not `(x0, y0, x1, y1)`, which would be more usual in this case? — Tim Roberts, Jun 16 '21 at 20:48
@TimRoberts that could be my issue. I can do some research to see what detectron outputs for the prediction box. Do you know how I can crop using `(x0, y0, x1, y1)`? I'm getting back into image manipulation so forgive me if this is straightforward but what points are `x0` vs `x1` on the image/box? — eTothEipiPlus1, Jun 16 '21 at 21:07
IF that's the case, then think of the tuple as (left,top,right,bottom). So, it would be as simple as `crop_img = image[y0:y1, x0:x1]` — Tim Roberts, Jun 17 '21 at 04:29
@TimRoberts I guess that's not it unfortunately. It's still not cropping the image properly. It's been difficult for me to find documentation on what type of coordinates pred_boxes detectron2 outputs. Maybe someone else might know. Thanks for all your help. — eTothEipiPlus1, Jun 17 '21 at 20:56

score 3 · Accepted Answer · answered Sep 01 '21 at 19:23

The following should work.

def crop_object(image, box):
  """Crops an object in an image

  Inputs:
    image: PIL image
    box: one box from Detectron2 pred_boxes
  """

  x_top_left = box[0]
  y_top_left = box[1]
  x_bottom_right = box[2]
  y_bottom_right = box[3]
  x_center = (x_top_left + x_bottom_right) / 2
  y_center = (y_top_left + y_bottom_right) / 2

  crop_img = image.crop((int(x_top_left), int(y_top_left), int(x_bottom_right), int(y_bottom_right)))
  return crop_img

# Get pred_boxes from Detectron2 prediction outputs
boxes = outputs["instances"].pred_boxes
# Select 1 box:
box = list(boxes)[0].detach().cpu().numpy()
# Crop the PIL image using predicted box coordinates
crop_img = crop_object(image, box)

Crop image using pred boxes coordinates

1 Answers1