4

I am training a Naive Bayes classifier on a balanced dataset with equal number of positive and negative examples. At test time I am computing the accuracy in turn for the examples in the positive class, negative class, and the subsets which make up the negative class. However, for some subsets of the negative class I get accuracy values lower than 50%, i.e. random guessing. I am wondering, should I worry about these results being much lower than 50%? Thank you!

Crista23
  • 3,203
  • 9
  • 47
  • 60

1 Answers1

2

It's impossible to fully answer this question without specific details, so here instead are guidelines:

If you have a dataset with equal amounts of classes, then random guessing would give you 50% accuracy on average.

To be clear, are you certain your model has learned something on your training dataset? Is the training dataset accuracy higher than 50%? If yes, continue reading.

Assuming that your validation set is large enough to rule out statistical fluctuations, then lower than 50% accuracy suggests that something is indeed wrong with your model.

For example, are your classes accidentally switched somehow in the validation dataset? Because notice that if you instead use 1 - model.predict(x), your accuracy would be above 50%.

Alex R.
  • 1,397
  • 3
  • 18
  • 33