2

I'm working on multi-class classification in python (4 classes). To obtain the results of each class separately, I used the following code:

from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
cnf_matrix = cm
FP = cnf_matrix.sum(axis=0) - np.diag(cnf_matrix)  
FN = cnf_matrix.sum(axis=1) - np.diag(cnf_matrix)
TP = np.diag(cnf_matrix)
TN = cnf_matrix.sum() - (FP + FN + TP)

FP = FP.astype(float)
FN = FN.astype(float)
TP = TP.astype(float)
TN = TN.astype(float)

# Sensitivity, hit rate, recall, or true positive rate
TPR = TP/(TP+FN)
print('TPR : ',TPR)

# Specificity or true negative rate
TNR = TN/(TN+FP)
print('TNR : ',TNR)

# Precision or positive predictive value
PPV = TP/(TP+FP)
print('PPV : ',PPV)

# Fall out or false positive rate
FPR = FP/(FP+TN)
print('FPR : ',FPR)
# False negative rate
FNR = FN/(TP+FN)
print('FNR : ',FNR)
# Overall accuracy
ACC = (TP+TN)/(TP+FP+FN+TN)
print('ACC : ',ACC)

I obtained the following results:

TPR :  [0.98398792 0.99999366 0.99905393 0.99999548]
TNR :  [0.99999211 0.99997989 1.         0.99773928]
PPV :  [0.99988488 0.99996832 1.         0.99810887]
FPR :  [7.89469529e-06 2.01061605e-05 0.00000000e+00 2.26072224e-03]
FNR :  [1.60120846e-02 6.33705530e-06 9.46073794e-04 4.52196090e-06]
ACC :  [0.99894952 0.99998524 0.99999754 0.99896674]

Now, I want to calculate the average value of each metrics ?! Should I just add the four values to each others, after that divide the results on 4 ? for example, for the accuracy (ACC) : (0.99894952 + 0.99998524 + 0.99999754 + 0.99896674)/4 ?!! Or What should I do exactly ? Help please.

j.doe
  • 662
  • 4
  • 19
Phd student
  • 25
  • 1
  • 1
  • 8
  • yes the way of average calculation is what you say, what is the problem with this? – j.doe May 11 '19 at 08:29
  • Thanks for answering me sir, i used this method to calculate the Acc, the results was : 0.99947476. After that i used : "from sklearn.metrics import accuracy_score" "accuracy_score(y_test, y_pred)" and the results was quit different: 0.99894952 Why ? the second method should give me the average result directly, but it's different from the way that i mentioned before. – Phd student May 11 '19 at 08:42
  • hi again, I don't use `from sklearn.metrics import accuracy_score` yet and I don't know why but the answer `0.99947476` is true, calculate it in this way – j.doe May 12 '19 at 03:51

1 Answers1

1

Accuracy is total correct predictions divided by total number of predictions. Now lets say you have a dataset with 45 entries in test set with 4 classes.

class 1: 10 rows
class 2: 10 rows
class 3: 10 rows
class 4: 15 rows

Now per class accuracy is

class 1: 1 (10/10)
class 2: 1 (10/10)
class 3: 1 (10/10)
class 4: 0.33 (5/15)

Now if you sum all the accuracy and divide it by 4, i.e. your approach, the answer will be 0.83.

If you sum the total number of correct predictions, that is 35 out of 45, the accuracy is 35/45 = 0.77

So both methods are not same. The method of taking average of accuracy, i.e. what you are doing will only work if all classes are balanced otherwise its the wrong method.

You should calculate the total number of correct predictions and divide it by total number of predictions i.e. correct / (correct+wrong)

secretive
  • 2,032
  • 7
  • 16