I was going to write my own implementation of the YOLOv3 and coming up with some problem with the loss function. The original paper mention that he uses Binary Cross Entropy on the class prediction part, which is what I did.
I tried reading some code by the original darknet code, but I didn't find anything that that related to the BCE loss. And I read furthermore with some approach using Keras, Pytorch, and TensorFlow. Everyone seems to have their own opinion on the loss function. Some just take MSE for width and height estimation, and the rest with BCE, some take x,y,w,h with MSE and the rest with BCE.
Here's some of my code:
loss_x = self.mse_loss(x[mask], tx[mask])
loss_y = self.mse_loss(y[mask], ty[mask])
loss_w = self.mse_loss(w[mask], tw[mask])
loss_h = self.mse_loss(h[mask], th[mask])
loss_conf = self.bce_loss(pred_conf[conf_mask_false], tconf[conf_mask_false]) + self.bce_loss(pred_conf[conf_mask_true],tconf[conf_mask_true])
loss_cls = (1 / nB) * self.ce_loss(pred_cls[mask],torch.argmax(tcls[mask], 1))
loss = loss_x + loss_y + loss_w + loss_h + loss_conf + loss_cls
As the loss function plays an important role in the training. I wish someone could help me to figure it out.