I'm trying to make a pos/neg review classifier and wanted to use Multinomial naive bayes (or regular naive bayes). If I don't feature select using SelectKbest
Chi2, it works fine. But if I do, I get the following error:
Traceback (most recent call last):
File "<ipython-input-176-a426973d76d1>", line 1, in <module>
bayes_predict = bayes.predict(X_dev)
File "/home/c1962135/.local/share/virtualenvs/c1962135-9R_1M4TP/lib/python3.6/site-packages/sklearn/naive_bayes.py", line 65, in predict
jll = self._joint_log_likelihood(X)
File "/home/c1962135/.local/share/virtualenvs/c1962135-9R_1M4TP/lib/python3.6/site-packages/sklearn/naive_bayes.py", line 737, in _joint_log_likelihood
return (safe_sparse_dot(X, self.feature_log_prob_.T) +
File "/home/c1962135/.local/share/virtualenvs/c1962135-9R_1M4TP/lib/python3.6/site-packages/sklearn/utils/extmath.py", line 142, in safe_sparse_dot
return np.dot(a, b)
File "<__array_function__ internals>", line 6, in dot
ValueError: shapes (5000,7001) and (4000,2) not aligned: 7001 (dim 1) != 4000 (dim 0)
I'll explain the structure of my code:
size(train_dataset) = (15000,4)
size(dev_dataset) = (5000, 4)
size(test_dataset) = (5000,4)
They are all pandas dataframes. I used 3 types of features (a 5000 one, 2000, and 1) so the train, test and dev arrays look:
size(X_train)=(15000, 70001)
size(X_dev) = = (5000,7001)
size(X_test) = (5000,7001)
For feature reduction, training and testing I use the following code:
chitest = SelectKBest(score_func=chi2, k=4000)
chi = chitest.fit(X_train, Y_train)
X_train_new = chi.transform(X_train)
bayes = MultinomialNB()
bayes.fit(X_train_new,Y_train)
bayes_predict = bayes.predict(X_dev)
print(classification_report(Y_test_gold, bayes_predict))
And this gives me the error from before, but I really can't figure out why.