This is not a generic question about anchor boxes, or Faster-RCNN, or anything related to theory. This is a question about how anchor boxes are implemented in pytorch, as I am new to it. I have read this code, along with a lot of other stuff in the torch repo:
https://github.com/pytorch/vision/blob/main/torchvision/models/detection/anchor_utils.py
Is the "sizes" argument to AnchorGenerator with respect to the original image size, or with respect to the feature map being output from the backbone?
To be more clear and simplify, let's say I'm only ever interested in detecting objects that are 32x32 pixels in my input images. So my anchor box aspect ratio will definitely be 1.0 as height=width. But, is the size that I put into AnchorGenerator 32? Or do I need to do some math using the backbone (e.g. I have 2 2x2 max pooling layers with stride 2, so the size that I give AnchorGenerator should be 32/(2^2) = 8)?