I'm using detectron2 for solving a segmentation task, I'm trying to classify an object into 4 classes, so I have used COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml. I have applied 4 kind of augmentation transforms and after training I get about 0.1 total loss.
But for some reason the accuracy of the bbox is not great on some images on the test set, the bbox is drawn either larger or smaller or doesn't cover the whole object.
Moreover sometimes the predictor draws few bboxes, it assumes there are few different objects although there is only a single object.
Are there any suggestions how to improve it's accuracy?
Are there any good practice approaches how to resolve this issue?
Any suggestion or reference material will be helpful.