0

Beginner at machine learning here! Just like to get a sensing of how I should approach a classification problem. Given that the problem at hand is to say classify whether an object belongs to class A or class B, I am wondering whether I should use a generative or a discriminative model. I have 2 questions.

  1. A discriminative model seems to do a better job at classification problems because it is purely concerned with how the decision boundary is drawn and nothing else.

Q: However, with a small dataset of around 80 class A objects and less than 10 class B objects to train and test, would a discriminative model overfit and therefore a generative model would perform better?

  1. Also, with a very huge difference in numbers of the number of class A objects and class B objects, the model trained is likely to only be able to pick up on class A objects. Even if the model classifies all objects to be class A, this would still result in a very high accuracy score.

Q: Any ideas on how to reduce this biasedness given that there is no other way of increasing the size of class B's dataset?

kmario23
  • 57,311
  • 13
  • 161
  • 150
Amoroso
  • 945
  • 2
  • 10
  • 21
  • 1
    This seems more appropriate to stats SE. – Dr. Snoopy Apr 04 '17 at 07:17
  • 1
    To deal with you imbalanced data, you can always over sample the smaller dataset. Oversampling can be either done through duplicating you class A objects 8 times or can be done by advanced methods such as SMOTE! Then you can attempt for a discriminative algorithm! – TrnKh Apr 04 '17 at 14:39

0 Answers0