Multilabel classification neural network, any one label

Question

I am trying to figure out to build a neural network in which let's say I have 3 output labels (A, B, C).

Now my data consist of rows in which 2 of the labels can be 1. Like A and B will be 1 and C will be 0. Now I want to train my neural network such that it can predict A or B. I don't want it to be trained to have high probability for both A and B (like multilabel problems), I want only one of them.

The reason for this is that the rows having 1 in A and B are more like don't care rows in which predicting either A or B will be correct. So I don't want neural network to find minima where it tries to predict both A and B.

Is it possible to train neural network like this?

Your label space consists of 8 distinct sets of labels, from `0 0 0` to `1 1 1`, out of which `1 1 0` and `1 1 1` are don't care cases. Now, in the rest 6 cases, do you care about predicting `A` and `B` both correctly or any one of them consistently over the dataset or just any one of them over the entire order without consistency. For example, if the label is `0 1 0`, do you care if it predicts either `1 1 *` or `0 0 *` or `0 1 *` (* means either 0 or 1). Consistently means given the same example `0 1 0`, on one batch it may predict as `0 0 *` & the next `1 1 *` & next `0 1 *`. Explain this. — Autonomous, Apr 12 '18 at 18:44
If the label is [0,1,0], then I would expect the system to predict [0,1,0] correctly as there is only 1 answer. If labels are [1,1,0], then either `[1,*,0]` or `[*,1,0]` will be correct, and no consistency is needed. — user9637850, Apr 13 '18 at 04:12

score 0 · Answer 1 · answered Apr 12 '18 at 21:13

TL;DR:

a typical network will give you a probability for each class.
how you interpret it is up to you.
if you get equal weights in a single label scenario it means both labels are equally likely

The typical implementation for multi class classifier with neural networks is using a softmax layer, with one output per class

if you want a single label classifier, you treat the output with the maximum value as the selected label. the actual value of this output compared to the others is a measure of the confidence in this value.

in case of equality, it means that both outputs are as likely

Autonomous · Accepted Answer · 2018-04-13T18:40:04.037

I think using a weight is the best way I can think of for your application.

Define a weight w for each sample such that w = 0 if A = 1 and B = 1, else w = 1. Now, define your loss function as:

w * (CE(A) +CE(B)) + w' * min(CE(A), CE(B)) + CE(C)

where CE(A) gives the cross-entropy loss over label A. The w' indicates complement of w. The loss function is quite simple to understand. It will try to predict both A and B correctly when both A and B are not 1. Otherwise, it will either predict A or B correctly. Remember, which one out of A and B will be predicted correctly cannot be known in advance. Also, it may not be consistent over batches. Model will always try to predict the class C correctly.

If you are using your own weights to indicate sample importance, then you should use multiply the entire above expression with that weight.

However, I wouldn't be surprised if you get similar (or even better) performance with the classic multi-label loss function. Assuming you have equal proportion of each label, then only in 1/8th of cases, you are allowing your network to predict either A or B. Otherwise, the network has to predict all three of them correctly. Usually, the simpler loss functions work better.

Multilabel classification neural network, any one label

2 Answers2