I have a dataset with 2 labels which I know there's strong correlation among these 2 labels. However, when I use scikit multilearn binary relevance which doesn't consider correlation I get very similar results as Labelpowerset classifier which considers these label correlation ? Any comments on this? Besides I use http://scikit-multilearn.github.io/ I don't know how valid it is??
Why are Multilabel performance results the same as independant one despite strong label correlation?
Asked
Active
Viewed 319 times
4

Jacek Konieczny
- 8,283
- 2
- 23
- 35

sarah daneshvar
- 75
- 1
- 8
-
There are a few questions: What is the evaluation metric, what is the experiment process. Also, as there are just 2 labels and they are strongly correlated and if you already are able to learn the labels well independently, the additional label may not improve the result. A drop in the performance would have been a problem. For example, if you have a binary classification problem, and then you invert the target labels and introduce a new column, then this specific synthetic multi-label problem will not help to improve the prediction. – phoxis May 11 '18 at 11:10
1 Answers
1
I am the author of scikit-multilearn. In order to answer your questions I would need to see the plots of label combinations. Two labels yield 4 combinations, but if combinations [1,0] and [0,1] are heavily dominating the case of [0,0] or [1,1] then you might have a problem with Label Powerset not being able to properly learn the base classifier for the correlation case. It also depends which measure are you using to verify the performance?

niedakh
- 2,819
- 2
- 20
- 19