Multinomial naive bayes classification problem, normalization required?

Question

Classification using multinomial naive bayes is not working, see the code

from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction import DictVectorizer
import numpy as np

# training data
data = [
{'house': 100, 'street': 50, 'shop': 25, 'car': 100, 'tree': 20},

{'house': 5, 'street': 5, 'shop': 0, 'car': 10, 'tree': 500, 'river': 1}
] 

dv = DictVectorizer(sparse=False)
X = dv.fit_transform(data)
Y = np.array([10, 20])


mnb=MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True)
mnb.fit(X, Y)

# test data
test_data1 = [
{'testname': 0, 'street': 0, 'shop': 0, 'car': 0, 'Hi': 0, 'Blue': 5},
]


print (mnb.predict(dv.transform(test_data1)) )

Output is [10], But I was expecting it to be [20].

What is wrong here, my understanding?

score 0 · Accepted Answer · answered Aug 01 '19 at 11:36

Your test set gives the same probability for both 10 and 20. Here's an example of how Naive Bayes calculates probability of each output category. https://medium.com/syncedreview/applying-multinomial-naive-bayes-to-nlp-problems-a-practical-explanation-4f5271768ebf

In your example, none of the attributes in the test data appears in the training data (The words street, shop and car has a probability of 0).

Try running the code

#Return probability estimates for the test vector X.
print (mnb.predict_proba(dv.transform(test_data1)) )

Both the classes have an accuracy of 0.5. So the model returns the first class which is 10.

Multinomial naive bayes classification problem, normalization required?

1 Answers1