How to handle unblanced labels in Multilabel Classification?

Question

These oversimplified example target vectors (in my use case each 1 represents a product that a client bought at least once a month)

[1,1,1,0,0,0,0,0,1,0,0,0]
[1,1,1,0,0,0,0,0,0,0,0,0]
[0,1,0,0,0,0,1,0,0,0,0,0]
[1,0,1,0,0,0,0,0,0,0,0,0]
[1,1,1,0,0,0,0,0,1,0,0,0]
[1,1,0,0,0,0,0,0,0,0,0,0]
[1,1,0,0,0,1,0,0,0,0,1,0]

contain labels that are far more sparse than others. This means the target vectors contain some products that are almost always bought and many that are seldomly bought.

In training the ANN (For activation the input layer uses sigmoid and the output layer sigmoid. The lossfct is binary_crossentropy. What the features to predict the target vector exactly are, is not really relevant here I think.) only learns that putting 1 in the first 3 labels and 0 for the rest is good. I want the model not to learn this pattern, obviously. Also as a side note, I am more interested in true positives in the sparse labels than in the frequent labels. How should I handle this issue?

My only idea would be to exclude the frequent labels in the target vectors entirely, but this would only be my last resort.

Not a *programming* question, hence off-topic here; please see the intro and NOTE in https://stackoverflow.com/tags/machine-learning/info — desertnaut, Sep 09 '22 at 17:54
@desertnaut by "off-topic here", do you mean that I should post the same question rather in [Cross Validated](https://stats.stackexchange.com/) than in stackoverflow? Or do you refer to the tags within stackoverflow? — Viktor, Sep 10 '22 at 08:53

score 1 · Answer 1 · edited Sep 09 '22 at 17:56

1

There are two things I would try in this situation:

Add dropout layers (or some other layers that would descrease the dependence on certain neurons)
Use Oversampling or Undersampling technics. In this case it would increase the data from classes less represented (or decrease the data from classes over represented)

But overall, I think regularization would be more effective.

edited Sep 09 '22 at 17:56

desertnaut

57,590
26
140
166

answered Sep 09 '22 at 17:54

VINICIUS S

33
5

How to handle unblanced labels in Multilabel Classification?

1 Answers1