In slow R-CNN paper, the bounding box regression's goal is to learn a transformation that maps a proposed bounding box P to a ground-truth box G and we parameterize the transformation in terms of four functions dx(P),dy(P),dw(P),dh(P).
The first 2 specifies a scale-invariant translation of the center of P's bounding box, while the
2nd two specifies log-space translations of the width and height of P's bounding box relative to an object proposal.
It's the same technique used in Fast-RCNN paper too for BB prediction.!
Question1. Could anyone help me to understand the relevance of scale-invariance and log-space(both) of the bounding box and how these function capture these two aspects?
Question2. How the above mentioned BB scale-invariant translation is different from achieving scale-invariant object detection(explained below)?
I mean in fast R-CNN the author pointed out that below 2 ways are to achieve scale invariance in object detection:
First, the brute-force approach, each image is processed at a pre-defined pixel size during both training and testing. The network must directly learn scale-invariant object detection from the training data
The second approach is using image pyramids.
Please feel free to cite the research paper so that I can read for in-depth understanding.