Beginner at machine learning here! Just like to get a sensing of how I should approach a classification problem. Given that the problem at hand is to say classify whether an object belongs to class A or class B, I am wondering whether I should use a generative or a discriminative model. I have 2 questions.
- A discriminative model seems to do a better job at classification problems because it is purely concerned with how the decision boundary is drawn and nothing else.
Q: However, with a small dataset of around 80 class A objects and less than 10 class B objects to train and test, would a discriminative model overfit and therefore a generative model would perform better?
- Also, with a very huge difference in numbers of the number of class A objects and class B objects, the model trained is likely to only be able to pick up on class A objects. Even if the model classifies all objects to be class A, this would still result in a very high accuracy score.
Q: Any ideas on how to reduce this biasedness given that there is no other way of increasing the size of class B's dataset?