Why are Multilabel performance results the same as independant one despite strong label correlation?

Question

I have a dataset with 2 labels which I know there's strong correlation among these 2 labels. However, when I use scikit multilearn binary relevance which doesn't consider correlation I get very similar results as Labelpowerset classifier which considers these label correlation ? Any comments on this? Besides I use http://scikit-multilearn.github.io/ I don't know how valid it is??

There are a few questions: What is the evaluation metric, what is the experiment process. Also, as there are just 2 labels and they are strongly correlated and if you already are able to learn the labels well independently, the additional label may not improve the result. A drop in the performance would have been a problem. For example, if you have a binary classification problem, and then you invert the target labels and introduce a new column, then this specific synthetic multi-label problem will not help to improve the prediction. — phoxis, May 11 '18 at 11:10

score 1 · Answer 1 · answered Feb 18 '16 at 16:23

I am the author of scikit-multilearn. In order to answer your questions I would need to see the plots of label combinations. Two labels yield 4 combinations, but if combinations [1,0] and [0,1] are heavily dominating the case of [0,0] or [1,1] then you might have a problem with Label Powerset not being able to properly learn the base classifier for the correlation case. It also depends which measure are you using to verify the performance?

Why are Multilabel performance results the same as independant one despite strong label correlation?

1 Answers1