MultinomialNB fails with "ValueError: shapes not aligned" during prediction phase

Question

I am trying to do a MultinomialNB(). I have a csv, that I read into a dataframe (data) and did some tokenizing and lemmatization on the data in order to have the most used words. The code for the model is this:

max_features = 5000
count_vectorizer = CountVectorizer(max_features=max_features , stop_words= "english") 
sparce_matrix = count_vectorizer.fit_transform(Tweet_list).toarray()
y = data.iloc[:,0].values
x = sparce_matrix

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x,y, test_size = 0.1)

from sklearn.naive_bayes import MultinomialNB

Mn = MultinomialNB()
Mn.fit(x_train, y_train)
y_pred = Mn.predict(x_test)
print("Accuracy: ", Mn.score(y_pred.reshape(-1,1),y_test))

When i print the sizes of the variables:

print(y.size)
print(x.size)
print(x_train.size)
print(y_train.size)
print(x_test.size)
print("y test", y_test.size)
print("y pred", y_pred.size)

I get:

86460
432300000
389070000
77814
43230000
y test 8646
y pred 8646

However the model fails with ValueError: shapes (8646,1) and (5000,2) not aligned: 1 (dim 1) != 5000 (dim 0).

As far as I understand the problem is somewhere in the computation it does behind the methods where some np.dot(a, b) fails. It somehow computes the y_pred or y_test (8646) with a vector of the size of max features vector (5000). That is the only place where the value 5000 appears.

Can you print out shape instead of size? Also, at which line is the error occuring? — ranka47, Apr 13 '21 at 14:50
y (86460,) x (86460, 5000) x_train (77814, 5000) y_train (77814,) x_test (8646, 5000) y_test (8646,) y_pred (8646,) These are the shapes. Also the error was in the last line, print("Accuracy: ", Mn.score(y_pred.reshape(-1,1),y_test)) — Seth Hexflame, Apr 13 '21 at 20:33

score 0 · Answer 1 · answered Apr 14 '21 at 16:38

If you refer to the documentation of MultinomialNB, you can see that the first input to the score function is NOT y_pred but X. Hence, the call to the score function should be,

print("Accuracy: ", Mn.score(x_test,y_test))

self.predict(x_test) will get automatically called inside the function score.

Documentation should always be the first method of debugging your code.

MultinomialNB fails with "ValueError: shapes not aligned" during prediction phase

1 Answers1