I have trained a naive bayes MultinomialNB model to predict if an SMS is spam or not.
I get 2 classes as expected:
nb = MultinomialNB(alpha=0.0)
nb.fit(X_train, y_train)
print(nb.classes_)
#Output: ['ham' 'spam']
but when I output the coefficients I get only 1 array.
print(nb.coef_)
#Output: [[ -7.33025958 -6.48296172 -32.55333508 ... -9.52748415 -32.55333508
-32.55333508]]
I have already done the same with another dataset. There were 5 instead of 2 classes, it worked and I got a matrix with 5 arrays.
Here is the whole code:
sms = pd.read_csv("spam-sms.csv", header=0, encoding = "ISO-8859-1")
X = sms.iloc[:, 1].values
X_clean = X[pd.notnull(X)]
y = sms.iloc[:,0].values
y_clean = y[pd.notnull(y)]
vectorizer = CountVectorizer()
X_cnt = vectorizer.fit_transform(X_clean)
X_train, X_test, y_train, y_test = train_test_split(X_cnt, y_clean,
test_size=0.2, random_state=0)
nb = MultinomialNB(alpha=0.0)
nb.fit(X_train, y_train)
y_pred = nb.predict(X_test)
print(nb.coef_)
print(nb.classes_)
And here the code where it works with 5 classes
reviews = pd.read_csv("amazon-unlocked-mobile.csv", encoding='utf-8')
X = reviews.iloc[:,4].values
X_clean = X[pd.notnull(X)]
y = reviews.iloc[:,3].values
y_clean = y[pd.notnull(X)]
vectorizer = CountVectorizer()
X_cnt = vectorizer.fit_transform(X_clean)
X_train, X_test, y_train, y_test = train_test_split(X_cnt, y_clean,
test_size=0.2, random_state=0)
nb = MultinomialNB(alpha=0.0)
nb.fit(X_train, y_train)
y_predicted = nb.predict(X_test)
print(nb.coef_)
print(nb.classes_)