Why is data augmentation degrading performance for Mask-RCNN?

Question

I trained a Mask-RCNN for instance segmentation with and without data augmentation. The augmentation was simply a rotation which makes sense for the data concerned. I was very surprised that the augmentation run (dark blue) was worse that the non-augmentation run (light blue).

Since the augmentation plots see to just be shifted down but have the same shape I was wondering if there is something else at play. I am using a batch size of 2 and the dataset have 40K images, could that affect things?

How do you rotate your images? Are your rotating masks the same way (like it's data in albumentations for example)? Also light blue seems to perform better, average precision is better and so is recall, loss is lower also, I'm not sure what you mean by non-augmented is better. — Szymon Maszke, Jan 14 '21 at 20:28
@SzymonMaszke Sorry, mixed up my blues - I corrected the text. The same rotation is performed on both the image and the mask, I've also checkout and double checked for anything else weird like interpolation etc .. All looks good and I see a similar effect using a horizontal flip augmentation instead of rotation as well. — nickponline, Jan 14 '21 at 21:12
If you're not sure try more powerful backbone with and without data augmentation. Also try batch accumulation, batch of size 2 is way too small to get sensible results. — Szymon Maszke, Jan 14 '21 at 21:15
Cool thanks will try with a larger batch size, I think the Resnet50 FPN should be OK, unless I go all the way to Resnet101 — nickponline, Jan 14 '21 at 21:17
It seems that there's still room for improvement when it comes to loss. Augmentation may not help/destroy results if the model is too weak to overfit on the data it's given, which seems to be the case here. — Szymon Maszke, Jan 14 '21 at 21:19
@SzymonMaszke Do any of the specific loss metric for Mask RCNN support that as I'm trying to understand where to improve those here: https://stackoverflow.com/questions/65694335/which-parameters-of-mask-rcnn-control-mask-recall — nickponline, Jan 14 '21 at 21:40

Cynichniy Bandera · Accepted Answer · 2021-01-14T20:29:55.733

Not quite an answer.

I had similar effects with it and I think all the parameters and how you train it is important. For example, with more layers (resnet34 vs. resnet18 for the backbone) you need more information to train the bigger network. In this case, augmentations are useful.

Another example is network resolution. I trained it with the default one min_size=800 and max_size=1333 on some learning rate and with the higher resolution, you have a higher potential for the aggressive growth of the network AP on a higher LR. Yet another example related to this is how many "levels" you have in your FPN and what is the grid settings for AnchorGenerator. If your augmentations generate samples smaller than the anchors on a particular level of FPN then they probably will cause more issues than do any good. And if your augmentations generate samples such a small that the details of your object are not visible - again, not very useful, especially on small networks.

There are tons of similar small issues that matter. I had a situation, that rotations made the result worse because, with some rotation angle, the rotated sample started to look like a part of the background and the detector based on maskrcnn failed to work with it. Cubic interpolation fixed it a little bit but eventually, I came up with the idea to limit the angle of the rotation.

Just experiment and find hyperparameters that play well for your particular task.

Thanks for the comment, I'm actually dealing with that here: https://stackoverflow.com/questions/65694335/which-parameters-of-mask-rcnn-control-mask-recall if you have any guidance. My networking uses a Resnet50 FPN and images are 1000x1000 — nickponline, Jan 14 '21 at 21:15
Thank you for contribution to that question really appreciate it. — nickponline, Jan 19 '21 at 21:15

Why is data augmentation degrading performance for Mask-RCNN?

1 Answers1