2

I have been experimenting with object detection recently, using Faster R-CNN and YOLOv7 to train models on pre-existing datasets.

Using a UNO card dataset I quite accurately detected the type of UNO cards, based on the symbol in the top left corner. I used an object detection approach, with UNO cards only being categorized into 14 classes.

Based on that, I am wondering what the best approach would be to enhance the model to use for other and more comprehensive card games. Thinking of card games like Munchkin for example, which has 1000s of different cards. For card games like this, object detection might not be the best approach having 1000s of different classes to consider.


The two different approaches I am considering:

Using object detection, create x many classes as there are different playing cards in the game, training the model to detect every single card individually

or

Using object detection, use playing cards to train the model to detect the playing card itself, then using the detected playing card as input for an image classification algorithm

For me there are pros and cons for both methods:

The first approach might be much more accurate, as it detects each card individually. On the other hand, it seems to me that it needs considerably more classes and data to feed into those classes. It also might be difficult to expand the model with more unique cards, as you would have to rerun the model every time.

The second approach might not be as accurate, as it might not only detect playing cards but also identify other objects as playing cards. On the flip side, it seems to me that it is much easier to expand the model with more unique cards.


What might be the best approach here? Do you have a different approach to this, which might be more efficient?

Pallemann
  • 21
  • 2

1 Answers1

0

Between these two options, I would prefer to go with second option. The pros overwhelm cons in my point of view. Much more easy to scale thats for sure and if you want to expand this model to other card games it is a valuable point. But I would also suggest to use just plain image classification. I am not sure if it can outperform second option (I think it can't) but can be faster and if its still good why not give it a go. A standard multi label CNN is worth to try I think.