Are anchor box sizes in torchvision's AnchorGenerator with respect to the input image, feature map, or something else?

Question

This is not a generic question about anchor boxes, or Faster-RCNN, or anything related to theory. This is a question about how anchor boxes are implemented in pytorch, as I am new to it. I have read this code, along with a lot of other stuff in the torch repo:

https://github.com/pytorch/vision/blob/main/torchvision/models/detection/anchor_utils.py

Is the "sizes" argument to AnchorGenerator with respect to the original image size, or with respect to the feature map being output from the backbone?

To be more clear and simplify, let's say I'm only ever interested in detecting objects that are 32x32 pixels in my input images. So my anchor box aspect ratio will definitely be 1.0 as height=width. But, is the size that I put into AnchorGenerator 32? Or do I need to do some math using the backbone (e.g. I have 2 2x2 max pooling layers with stride 2, so the size that I give AnchorGenerator should be 32/(2^2) = 8)?

score 2 · Answer 1 · answered Apr 26 '22 at 19:33

Is the "sizes" argument to AnchorGenerator with respect to the original image size, or with respect to the feature map being output from the backbone?

the sizes argument is the size of each bounding box applied on the input image. If you are interested in detecting objects that are 32x32 pixels, you should use

anchor_generator = AnchorGenerator(sizes=((32,),),
                                   aspect_ratios=((1.0,),))

Are anchor box sizes in torchvision's AnchorGenerator with respect to the input image, feature map, or something else?

1 Answers1