Naive Bayes text classification using TextBlob: every instance predicted as negative when adding more sample size

Question

I am classifying documents as positive and negative labels using Naive Bayes model. It seems working fine for small balanced dataset size around 72 documents. But when I add more negative labeled documents, the classifier is predicting everything as negative.

I am splitting my dataset into 80% training and 20% test set. Adding more negatively labeled documents definitely makes the dataset skewed. Could it be the skewness that makes the classifier predict every test document as negative? I am using TextBlob/nltk implementation of Navive Bayes modle.

Any idea?

score 4 · Accepted Answer · answered Mar 04 '14 at 14:56

4

Yes, it could be that your data set is biasing your classifier. If there isn't a very strong signal to tell the classifier which class to choose, it would make sense for it to select the most prevalent class (negative in your case). Have you tried plotting the class distributions versus accuracy? Another thing to try is k-fold validation so that you are not by chance drawing a biased 80-20 training-test split.

answered Mar 04 '14 at 14:56

Danyule

331
1
6

Yes, I have 5 runs for different class distributions. I have observed an increase in false negative predictions when negative instances dominate in my training dataset. Thanks. – user2161903 Mar 05 '14 at 21:04

Naive Bayes text classification using TextBlob: every instance predicted as negative when adding more sample size

1 Answers1