How to refine a CoreML image classifier model with an object detection model?

Question

I have an image classifier model created with CreateML.

The labelling in the training set is roughly:

Image contains object A -> label a
Image contains object B -> label b
Image contains object C -> label c
Image contains object A + B -> label a
Image contains object A + B + C -> label c

You could say there is some "prioritization" of objects where object A has a higher priority than B, therefore label a should apply. The same with label c where object C has the highest priority.

This is clearly not optimal for the algorithm, so I would use an object identification algorithm which seems more appropriate. But I already have a huge data set with 100.000s of manually correctly classified images that would not be used to train the algorithm, and I would have to build a new training set from scratch for object detection which is obviously a cost issue and won't reach a data set size like the existing one anytime soon.

Is there a way I can leverage the existing data set to build an image classification model and augment it with an object detection model that I build manually from scratch but may only have a few 100 items in the data set?

score 0 · Accepted Answer · answered Jul 06 '20 at 18:34

0

One way to solve this is to use multi-label classification, where the model tells you the probability that A is present, the probability that B is present, and the probability that C is present, but these are independent from one another. Unfortunately, Create ML cannot train this kind of model.

answered Jul 06 '20 at 18:34

Matthijs Hollemans

7,706
2
16
23

Wouldn’t that be the same as an object identification model? – Manuel Jul 06 '20 at 18:35
1

The term "object detection" is used for models that both classify objects but also localize them, i.e. they predict bounding boxes. If you can have more than one of your A, B, C per image, and you want to know where they are in the image, then object detection is the kind of model you need. – Matthijs Hollemans Jul 07 '20 at 09:21
So multi-label classification means I don’t have to draw the bounding boxes when creating the data set and the algorithm will find out by its own what these objects are that each label belongs to? Because otherwise how else would the model predict multiple objects in an image? – Manuel Jul 07 '20 at 11:04
1

With multi-label classification it can tell you there is an A, a B, or a C object in the image (or perhaps more than one) but it won't tell you where they are, just that they are present. This will require you to provide multiple labels for each image, but you don't need to draw bounding boxes. – Matthijs Hollemans Jul 07 '20 at 15:34
A multi-label model is exactly the same as a multi-class model, by the way, except you use sigmoid instead of softmax, and binary cross-entropy instead of categorical cross-entropy as the loss function. – Matthijs Hollemans Jul 07 '20 at 15:34
If a multi-label model is almost the same as a multi-class model, is it possible to manually parametrize the CoreML Image Classification model (which is a multi-class model) and convert it to a multi-label model? – Manuel Jul 07 '20 at 15:44
1

You can turn it into a multi-label model by changing the mlmodel file, but it will be trained with the assumption that it's going to be multi-class, so the predictions won't make any sense. – Matthijs Hollemans Jul 07 '20 at 19:10

How to refine a CoreML image classifier model with an object detection model?

1 Answers1