Does training a multi-labeling CNNs on single-class data hinder accuracy?

Question

I built a CNN for a multi-label classification, i.e. predicting multiple labels per image.

I noticed that ImageNet and many of the other dataset actually include a set of examples per label. The way they structured the data is such that given a label, there is a list of examples for that label. Namely: label -> list of images. Also Keras, which I'm using, supports a data structure of a folder per label, and in each folder a list of images as examples fo the label.

The problem I'm concerned about is that many images may actually have multiple labels in them. For example, if I'm classifying general objects a single folder named 'Cars' will have images of cars, but some images of cars will also have people in them (and may hinder the results on the class 'People').

My first question: 1) Can this (i.e. single label per image in ground truth) reduce the potential accuracy of the network?

If this is the case, I thought of instead creating a dataset of the form: image1,{list of its labels} image2,{list of its labels} etc

2) Will such a structure produce better results?

3) What is a good academic paper about this?

Have you read the answers? – Marcin Możejko Feb 04 '18 at 18:34 — Marcin Możejko, Feb 04 '18 at 18:34
Yes, and wrote follow up questions below... – Chewbacca Feb 13 '18 at 09:46 — Chewbacca, Feb 13 '18 at 09:46

score 1 · Answer 1 · answered Jan 29 '18 at 03:13

That depends on how you measure accuracy. Specifically, it will depend on what Cost Function you use. Assuming you are using a Softmax Output Layer, you will get class probabilities as output, i.e. output node c will give the probability that the image shows an object of type c, such that all probabilities add up to one. Hence, you could train it using a Cross Entropy Cost Function, in which case you would use only one "correct" label per image and implicitly train your network to maximize the predicted probability of the correct label. Alternatively, you could construct a Cost Function which maximizes the predicted probability of multiple or all labels. The second approach will likely lead to slightly decreased performance when you then use the model for predicting the single most appropriate label and vice versa.

What people do in practice is the first approach, train on one most appropriate label. Since your network still outputs a probability for each class you can inspect the "wrongly" classified images and check whether your network's prediction is at least among the valid label. This is the standard paper to start with. Also check the website of the yearly ImageNet competetion.

Thanks for your response I appreciate it. In my case it is a multi-label problem, and many labels coexist for the same image. Meaning that I cannot have the probabilities adding to 1, I want each label to mapped to a confidence score / probability of 0 to 1 by itself (i.e. 0 to 1 per image and not for the sum). What do you think? — Chewbacca, Feb 07 '18 at 14:51

score 0 · Answer 2 · answered Jan 29 '18 at 09:04

The problem you described is a pretty well-known multiclassification problem. Instead of assigning a label from a predefined set - you are making a decision for each label separately if you want to assign it to a given image.

In case of keras setup - you could either build a vector of length nb_of_classes with sigmoid activation (model is trained using binary_crossentopy then) or set up multiple outputs (recommended if each label has multiple decisions to make - like predicting a class and some other value) for each class.

To answer your questions:

From my experience (and knowing how usual loss functions work) if you set up training for only one class - in the ideal scenario, this would lead to assigning 50%-50% (in case of two ground truth classes), 33%-33%-33% (in case of three ground truth classes), etc. As you may see - this might make problems e.g. with setting a threshold for classification. I'd personally choose the strategy with a separate output with sigmoid per class- remember - that having multiple pieces of information about the image should in general lead to better model performance.
As I mentioned earlier - providing multi-classes might help as you are providing e.g. an implicit class correlations and resolving class conflicts in case of multi-classes assigned.
Here you have a nice paper about your case.

Thanks for your answer. I read your answer and paper. I have a couple of follow up questions: 1. Is the method suggested in this paper standard? I found a whole list of papers in the ref suggesting different things to address this.. Although it seems like they are very different from one another. 2. By how much do you think it is essential to have multi-labeled training data to get good results? What I mean is, if I train the model like it was suggested in the paper you sent over single class data, vs. doing it on multi-label data, what is the delta in the accuracy? — Chewbacca, Feb 07 '18 at 14:56

Does training a multi-labeling CNNs on single-class data hinder accuracy?

2 Answers2