I am trying to train a YOLO model. For this purpose I have divided my input image of 224*224 into 14*14 grids.
Now if suppose theres an object its centre is located at Bx,By considering 0,0 as top left of image and has Bw, Bh height and width respectively.
Required_prediction=[Pc,Bx,By,Bw,Bh]
where Pc is probability of required object
Thus output of model will be 14*14*5.
My question is what should the output Label be ?
All boxes [0,0,0,0,0] and the box containing the centre of req img as [pc,bx,by,bw,bh]
OR
All boxes [0,0,0,0,0] except whole area of required image labelled as [pc,bx. . . ]
ALSO
for bx,by,bw,bh the centre of the image is to be specified wrt to top left of the image or the grid the coordinate fall into?