0

I'd like to detect something like game cards in images and I need to very precisely pinpoint their corners. I wanted to use a framework like Detectron2 or Keras which support keypoint detection. The problem is that in these sorts of frameworks, keypoint order matters. My game cards, however, are double symmetrical, so you can't say which corner is first, second, third and fourth.

Do you have any ideas how to train neural network for keypoint detection, in case the order in which the keypoints are returned should not be considered?

user2203031
  • 402
  • 3
  • 14

1 Answers1

0

Interesting question!

One idea would be to choose an arbitrary but consistent order for each label you provide. Then, just train a normal object keypoint detector with four keypoints. I personally like the model zoo in mmpose for this.

Here's one idea for what I mean by a consistent but arbitrary order. Let's say point 1 is always a corner on the shorter edge of the card so that if you label in clockwise order the next corner will also be on the shorter edge. To break the ambiguity, let's say we also choose point 1 to be the highest of the two corners with this property. Then, we label in clockwise order. Here's a picture of what I mean: Image showing example point ordering

I think this should work okay because the majority of keypoint detection algorithms are not rotation equivariant (to get rotational equivariance, you usually have to do data augmentation), and the network will view these data points as very different from each other. If you choose to do rotational data augmentation, you'll need to be smart about reordering your keypoints when you do this so that the ordering abides the rules I described.

kaybee
  • 66
  • 4
  • Thanks for the answer. This is interesting idea, but with my limited number of training examples rotational data augmentation is a must. So there will be cases where some corner is a "1", but after rotating the image by just a few degrees the opposite corner becomes a "1". It seems quite awkward, and also something very hard to learn for a CNN... – user2203031 Apr 26 '23 at 17:36
  • Yes, rotational data augmentation would be possible, you would just need to renumber your keypoints during augmentation. We do this when augmenting by mirroring, but also needing to tell left and right apart. This should actually tap into what is easy to learn with a CNN in my understanding, as CNNs are not rotationally equivariant. It is actually more difficult to get them to not care about rotation. – kaybee Apr 26 '23 at 18:25