0

I have trained my classifier on 3 dialects using text classification. And this was the confusion matrix and precision:

confusion matrix

[[27  6  0 16]
 [ 5 18  0 21]
 [ 1  3  6  9]
 [ 0  0  0 48]]

The precision

[0.81818182 0.66666667 1.         0.5106383 ]

How to know which row in the confusion matrix and which element in the precision belong to what dialect I have? I provided the training data to the classifier with the following labels :

Egyptian
Sudan
Iraqi
Jordan

Here's the code, I used RandomForestClassifier:

from sklearn.ensemble import RandomForestClassifier
classifier = RandomForestClassifier(n_estimators=1000, random_state=0)  
classifier.fit(X, labels)  
test_pred = classifier.predict(y)
precision_score(labels_test,test_pred,average=None)

output:

array([0.91024735, 0.94929397, 0.98622273, 0,95343322])
John Sall
  • 1,027
  • 1
  • 12
  • 25
  • The outputs will be in the same order as the input labels. So however you encoded the labels, ordinally or one-hot, the same transformer can be applied in reverse to your output labels – G. Anderson May 10 '19 at 15:41
  • Maybe this will help. Also I think as the above comment says, it's in the same order as the input you gave to the classifier. https://stackoverflow.com/questions/47792803/printing-the-precision-from-a-confusion-matrix-in-python – ウィエム May 10 '19 at 16:01
  • @G.Anderson But the input labels are shuffled. I have one csv file that has all 4 labels, how to know which one I gave it first? – John Sall May 10 '19 at 17:43
  • How did you transform your text labels into numeric labels? What method or function did you call? – G. Anderson May 10 '19 at 18:33
  • @G.Anderson I didn't transform them, they're string labels. I shuffled the training data and then passed them to the classifier – John Sall May 10 '19 at 18:34
  • What classifier are you using? Can you show some of your code? – G. Anderson May 10 '19 at 18:35
  • I used randomforest, okay, I will post the code – John Sall May 10 '19 at 18:36
  • @G.Anderson I edited the post – John Sall May 10 '19 at 18:42

1 Answers1

1

classifier.classes_ will give you the labels the classifier is scoring on in the order they are stored in the classifier object. That should be the same order as the outputs you've already got, though I would verify that with some spot-checking of your predictions to be sure

G. Anderson
  • 5,815
  • 2
  • 14
  • 21
  • What do you mean spot-checking of my predictions? – John Sall May 10 '19 at 19:43
  • I counted the numbers of each row in the confusion matrix, and then opened an excel sheet of the test labels and counted each label to match it with the numbers in the confusion matrix. – John Sall May 10 '19 at 19:47
  • 1
    What I meant was to do `classifier.predict_proba()`, which will return the class probabilities as, e.g., `[0.8, 0.1, 0. , 0.1]` and compare to `classifer.predict()`for a few samples and see if the class with the highest predicted class is the correct class in the assumed order. Just one more way to verify your assumptions. – G. Anderson May 10 '19 at 19:50