0

I am trying to train a YOLO model. For this purpose I have divided my input image of 224*224 into 14*14 grids.

Now if suppose theres an object its centre is located at Bx,By considering 0,0 as top left of image and has Bw, Bh height and width respectively.

Required_prediction=[Pc,Bx,By,Bw,Bh]

where Pc is probability of required object

Thus output of model will be 14*14*5.

My question is what should the output Label be ?

All boxes [0,0,0,0,0] and the box containing the centre of req img as [pc,bx,by,bw,bh]
OR
All boxes [0,0,0,0,0] except whole area of required image labelled as [pc,bx. . . ]

ALSO

for bx,by,bw,bh the centre of the image is to be specified wrt to top left of the image or the grid the coordinate fall into?

hitesh kumar
  • 421
  • 5
  • 8

1 Answers1

0

All boxes [0,0,0,0,0] and the box containing the center of req img as [pc,bx,by,bw,bh] is the right choice for the assumption that you divided the image into 14*14 grid.

but in real world problems they split the image using different sizes to solve this problem which means that you may split the image into 14*14, 8*8 and 4*4 grids to address different sizes of the objects

  • suppose theres one or 2 required object of intrest in the pic, then i am labelling 2 boxes out of 14*14 with [pc,bx,by.....] which means my model will easily learn to predict theres nothing in the image as thats true for 90% of the grids except 2 – hitesh kumar Jul 05 '20 at 19:00
  • That's true and i think that's what you want. the model need to learn if there is an object or not in the first place that's why you are using `pc`. take care that the loss function when `pc` is 1 is different than `pc` is 0 –  Jul 05 '20 at 19:06