0

Here is a code this code classify Text into 10 categories, it shows the overall accuracy of the algorithm at the end:

import numpy as np
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer
import pandas as pd
from collections import Counter
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfTransformer

df = pd.read_csv('data/wine_data.csv')

counter = Counter(df['variety'].tolist())
top_10_varieties = {i[0]: idx for idx, i in enumerate(counter.most_common(10))}
df = df[df['variety'].map(lambda x: x in top_10_varieties)]

description_list = df['description'].tolist()
varietal_list = [top_10_varieties[i] for i in df['variety'].tolist()]
varietal_list = np.array(varietal_list)

count_vect = CountVectorizer()
x_train_counts = count_vect.fit_transform(description_list)


tfidf_transformer = TfidfTransformer()
x_train_tfidf = tfidf_transformer.fit_transform(x_train_counts)

train_x, test_x, train_y, test_y = train_test_split(x_train_tfidf,   varietal_list,test_size=0.3)

clf = MultinomialNB().fit(train_x, train_y)
y_score = clf.predict(test_x)

n_right = 0
for i in range(len(y_score)):
if y_score[i] == test_y[i]:
    n_right += 1

print("Accuracy: %.2f%%" % ((n_right/float(len(test_y)) * 100))) code here

My question, how to get a relevance score for each article in the dataset, like this:

Relevance scores

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
tursunWali
  • 71
  • 8

1 Answers1

0

You can see the returned probability estimates for your test set using predict_proba method. Zipping it with the _classes should give you corresponding relevance.

probs = clf.predict_proba(test_x)

for i in range(len(test_x)):
    probs_classes = list(zip(clf._classes, probs[i]))
    print(f"X = {test_x[i]}, Predicted = {probs_classes}")
BcK
  • 2,548
  • 1
  • 13
  • 27
  • I'm afraid there is no model created unlike in Keras. – tursunWali Mar 22 '21 at 01:06
  • @tursunWali Your model is your classifier, it was a typo. Change `model` with `clf`. – BcK Mar 22 '21 at 23:30
  • when I implement your method (after add these two lines: clf=svm.SVC(probability=True) clf.fit(test_x, test_y)) it shows an error : "TypeError: sparse matrix length is ambiguous; use getnnz() or shape[0]" – tursunWali Mar 25 '21 at 04:51
  • I got probability for test_x in this way: 'train_x, test_x, train_y, test_y = train_test_split(x_train_tfidf, varietal_list, test_size=0.3) clf=svm.SVC(probability=True) clf.fit(test_x, test_y) w.writerow(clf.predict_proba(test_x))' I got probability, but not in the way I wish: there is no class names – tursunWali Mar 25 '21 at 05:31