2

If a dataset contains multi categories, e.g. 0-class, 1-class and 2-class. Now the goal is to divide new samples into 0-class or non-0-class.

One can

  1. combine 1,2-class into a unified non-0-class and train a binary classifier,
  2. or train a multi-class classifier to do binary classification.

How is the performance of these two approaches?

I think more categories will bring about a more accurate discriminant surface, however the weights of 1- and 2- classes are both lower than non-0-class, resulting in less samples be judged as non-0-class.

useprxf
  • 269
  • 1
  • 3
  • 13

1 Answers1

2

Short answer: You would have to try both and see.

Why?: It would really depend on your data and the algorithm you use (just like for many other machine learning questions..)

For many classification algorithms (e.g. SVM, Logistic Regression), even if you want to do a multi-class classification, you would have to perform a one-vs-all classification, which means you would have to treat class 1 and class 2 as the same class. Therefore, there is no point running a multi-class scenario if you just need to separate out the 0.

For algorithms such as Neural Networks, where having multiple output classes is more natural, I think training a multi-class classifier might be more beneficial if your classes 0, 1 and 2 are very distinct. However, this means you would have to choose a more complex algorithm to fit all three. But the fit would possibly be nicer. Therefore, as already mentioned, you would really have to try both approaches and use a good metric to evaluate the performance (e.g. confusion matrices, F-score, etc..)

I hope this is somewhat helpful.

Vahe Tshitoyan
  • 1,439
  • 1
  • 11
  • 21