3

I have been using Detectron2 for recognizing 4 keypoints on each image, My dummy dataset consists of 1000 images, and I applied augmentations.

def build_train_loader(cls, cfg):
    augs = [
           T.RandomFlip(prob=0.5,horizontal=True),
           T.RandomFlip(prob=0.5,horizontal=False,vertical=True),
           T.RandomRotation(angle=[0, 180]),                           
           T.RandomSaturation(0.9, 1.9)
           ]
    return build_detection_train_loader(cfg, 
                                        mapper=DatasetMapper(cfg, 
                                                is_train=True,
                                                augmentations=augs)
                                    )

I have checked the images after those transforms which I have applied (each type of transform was tested separately), and it seems it has done well, the keypoints are positioned correctly.

Now after the training phase (keypoint_rcnn_R_50_FPN_3x.yaml), I get some identical keypoints, which means in many images the keypoints overlap, Here are few samples from my results:

[[[180.4211, 332.8872,   0.7105],
[276.3517, 369.3892,   0.7390],
[276.3517, 366.9956,   0.4788],
[220.5920, 296.9836,   0.9515]]]

And from another image:

[[[611.8049, 268.8926,   0.7576],
[611.8049, 268.8926,   1.2022],
[699.7122, 261.2566,   1.7348],
[724.5556, 198.2591,   1.4403]]]

I have compared the inference's results with augmentations and without, and it seems with augmentation the keypoints are barely getting recognized . gosh, How can it be?

Can someone please suggest any idea how to overcome those kind of mistakes? what am I doing wrong?

Thank you!

I have added a link to my google colab notebook: https://colab.research.google.com/drive/1uIzvB8vCWdGrT7qnz2d2npEYCqOxET5S?usp=sharing

Shai
  • 111,146
  • 38
  • 238
  • 371
JammingThebBits
  • 732
  • 11
  • 31
  • Could you provide a minimal reproducible example so that it's easier to help you? – Zaccharie Ramzi Jul 14 '21 at 17:41
  • @ZaccharieRamzi Yes sure, adding the code for training. – JammingThebBits Jul 14 '21 at 17:43
  • Ideally, it would be a snippet of code (as minimal as possible) that we could just copy paste in a colab to run it and see the error that you are describing. I found that in the past, doing so (i.e. setting the code as such and trying to reduce it down to its minimal essence) allowed me to understand 50% of the errors I was going to ask as questions on SO. The remaining 50% are already in good shape for others to help. – Zaccharie Ramzi Jul 14 '21 at 18:27
  • @Zaccharie Ramzi I have posted the code on google colab and made sure it runs with no errors. you can "run all" the cells and see for yourself. thank you! Link appears now on the original post. – JammingThebBits Jul 14 '21 at 22:03
  • what are the image, and what are these keypoints? is it possible that the keypoints are not well defined under the augmentations? – Shai Jul 15 '21 at 09:41
  • @Shai hi, the images are simple rectangles. I have attached a google colab link, the code generates those images too. please take a look, thank you. – JammingThebBits Jul 15 '21 at 09:42
  • so, you want the algorithm to recognise the top right corner as "**first** keypoint", the top left corner as "**scond** keypoint" etc. Now you flip the image, and you tell the algorithm the first keypoint is the top left. Your lbeling is not consistent. – Shai Jul 15 '21 at 09:44
  • The augmentation is done on the fly in memory. I have checked the images with the corresponding keypoints after applying the augmentations before training and it positioned the keypoints' corners correctly. @shai can you *please* elaborate am my approach is wrong? – JammingThebBits Jul 15 '21 at 09:47

1 Answers1

1

The problem is that there's nothing unique about the different corners of the rectangle. However, in your annotation and in your loss function there is an implicit assumption that the order of the corners is significant:
The corners are labeled in a specific order and the network is trained to output the corners in that specific order.

However, when you augment the dataset, by flipping and rotating the images, you change the implicit order of the corners and now the net does not know which of the four corners to predict at each time.

As far as I can see you have two ways of addressing this issue:

  1. Explicitly force order on the corners:
    Make sure that no matter what augmentation the image underwent, for each rectangle the ground truth points are ordered "top left", "top right", "bottom left", "bottom right". This means you'll have to transform the coordinates of the corners (as you are doing now), but also reorder them.
    Adding this consistency should help your model overcome the ambiguity in identifying the different corners.

  2. Make the loss invariant to the order of the predicted corners:
    Suppose your ground truth rectangle span the domain [0, 1]x[0, 1]: the four corners you should predict are [[0, 0], [1, 1], [1, 0], [0, 1]]. Note that if you predict [[1, 1], [0, 0], [0, 1], [1, 0]] your loss is very high, although you predicted the right corners just in a different order than annotated ones.
    Therefore, you should make youy loss invariant to the order of the predicted points:
    enter image description here where pi(i) is a permutation of the corners.

Shai
  • 111,146
  • 38
  • 238
  • 371
  • Shai thank you pointing this out, I weren't aware the order of the corners is significant. – JammingThebBits Jul 15 '21 at 15:39
  • 1
    @JammingThebBits the order is significant only if you treat it this way. If you loss is invariant to the order, then the trained model will also be invariant. Right now you have a specific order dictated by the labeling and the model fit that specific order. Your augmentations mess this up – Shai Jul 15 '21 at 15:43
  • I understand, How can I avoid dictating the order? I thought giving the keypoints the same name treats all keypoints the same: ```MetadataCatalog.get(name="synthetic_" + d).set(keypoint_names=["corner", "corner", "corner", "corner"])``` weird I'll read detectron2 docs again, thank you again for your great efforts you put into explaining this. – JammingThebBits Jul 15 '21 at 15:50
  • @JammingThebBits naming alone won't be enough: it has to be consistent with the augmentations. I suppose making the loss invariant to the order of the corners – Shai Jul 15 '21 at 15:56
  • No problem, I just wrote a function which orders the keypoints in a specific order as you suggested. I'll run it on ground truth images and on the images the transforms were applied. ``` for batch in trainer.data_loader: for idx, per_image in enumerate(batch): apply_reorder_keypoints(...) ``` but it seems the loop is infinity, I guess there is another way how to apply my rearrange function on the transformed images. I'll look into it. – JammingThebBits Jul 15 '21 at 16:04
  • I have modified the function 'transform_keypoint_annotations' and it worked much better, thank you!! – JammingThebBits Jul 16 '21 at 05:22
  • For avoiding future issues like this, should I always order the keypoints in a specific order in case I'm dealing with keypoints with the same meaning , i.e "corners of a paper" ? (I'm writing few guidelines for myself) – JammingThebBits Jul 16 '21 at 05:33
  • 1
    @JammingThebBits you need the task to be well defined: when you have consistent ordering your task is "I want the fiest orediction to be corner A, the second B, etc". When you have permutation invariant loss then your task becomes: "I want to predict all 4 corners regardless of their order". However, when you have no order and no special loss, the task is not well defined: which of the corners you want to be prrdicted first, etc? when the task is not well defined there's no telling what the network will predict – Shai Jul 16 '21 at 06:25