4

I have a dataset with 2 labels which I know there's strong correlation among these 2 labels. However, when I use scikit multilearn binary relevance which doesn't consider correlation I get very similar results as Labelpowerset classifier which considers these label correlation ? Any comments on this? Besides I use http://scikit-multilearn.github.io/ I don't know how valid it is??

Jacek Konieczny
  • 8,283
  • 2
  • 23
  • 35
  • There are a few questions: What is the evaluation metric, what is the experiment process. Also, as there are just 2 labels and they are strongly correlated and if you already are able to learn the labels well independently, the additional label may not improve the result. A drop in the performance would have been a problem. For example, if you have a binary classification problem, and then you invert the target labels and introduce a new column, then this specific synthetic multi-label problem will not help to improve the prediction. – phoxis May 11 '18 at 11:10

1 Answers1

1

I am the author of scikit-multilearn. In order to answer your questions I would need to see the plots of label combinations. Two labels yield 4 combinations, but if combinations [1,0] and [0,1] are heavily dominating the case of [0,0] or [1,1] then you might have a problem with Label Powerset not being able to properly learn the base classifier for the correlation case. It also depends which measure are you using to verify the performance?

niedakh
  • 2,819
  • 2
  • 20
  • 19