Do anchor box size gets refined during training object detection models like Faster R CNN,YOLO and SSD?

Question

I was learning, working of object detection models Faster R CNN,YOLOv3 and SSD.I got confusion with anchor box size refining.

You question is not very clear, can you provide more detail on what is "box size refining"? — Louis Lac, Apr 16 '21 at 18:37
Anchors are nothing but a set of reference boxes that indicates the possible objects. In yolo after extracting feature map, the image is divided into grids , and each grid is assosiated with anchor boxes with varied aspect ratio.If object is present with in that grid these default boxes/anchors will give objectness score with bounsing box co-ordinates. similarly for ssd there are 8732 default bbox and for faster rcnn anchor boxes are present which gives the objectness score if any object present.This is done during RPN proposal generator. — B.Thushar Marvel, Apr 17 '21 at 03:31
My question is whether the default box/anchor box size gets updated during training(back propagation) or remains same? — B.Thushar Marvel, Apr 17 '21 at 03:32

score 1 · Answer 1 · answered Apr 17 '21 at 13:18

Of course, the anchor box is refined during training. It's the only way how the network could learn to predict accurate boxes and correct any localization errors made. The network learns offsets to refine the anchor box in shape and size.

You can read more about how anchor boxes work here

score 1 · Accepted Answer · answered Apr 19 '21 at 08:00

Of course anchor boxes size (and position) get refined during training. As you said in a comment anchor boxes are nothing more than a set of reference boxes present at fixed positions on the output grid. Each grid cell additionally predicts the object-ness score as well as the label and the exact coordinates of the bounding box.

These last coordinates correspond to the box size refining you are talking about. The implementation of such regression differs upon networks (SSD, Yolo, Faster-RCNN, ...).

I encourage you to read the literature, and especially the Yolo papers that are very clear. In "YOLO9000: Better, Faster, Stronger" (available for free online), bounding box refining is explained in great detail page 3.

Of course all of this is learnt during training, take a look at the loss function of Yolo in "You Only Look Once: Unified, Real-Time Object Detection" paper page 4.

Do anchor box size gets refined during training object detection models like Faster R CNN,YOLO and SSD?

2 Answers2