-1

I'm using naive Bayes for text classification and I have 100k records in which 88k are positive class records and 12krecords are negative class records. I converted sentences to unigrams and bigrams using countvectorizer and I took alpha range from [0,10] with 50 values and I draw the plot. enter image description here

In Laplace additive smoothing, If I keep increasing the alpha value then accuracy on the cross-validation dataset also increasing. My question is is this trend expected or not?

Ravi
  • 2,778
  • 2
  • 20
  • 32
  • Use both [RandomizedSearchCV](http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html) and [GridSearchCV](http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html) First use RandomizedSearchCV & then GridSearchCV. In that way `alpha` will be hyper tuned more accurately. Also try variety of values for alpha like from 1e-4 to 1e3 – Kalsi Sep 13 '18 at 18:35
  • yeah ,I used GridSearchCV but accuracy is keep increasing as alpha increases – Ravi Sep 13 '18 at 18:42

2 Answers2

0

If you keep increasing the alpha value then naive bayes model will bias towards the class which has more records and model becomes a dumb model(underfitting) so by choosing small alpha value is good idea.

0

Because you have 88k Positive Point and 12K negative point which means that you have unbalanced data set. You can add more negative point to balanced data set, you can clone or replicate your negative point which we called upsampling. After that, your data set is balanced now you can apply naive bayes with alpha it will work properly, now your model is not dumb model, earlier you model was dumb that's why as increase alpha it increase you Accuracy.