5

Consider a three class classification problem with the following confusion matrix.

cm_matrix = 
                predict_class1    predict_class2    predict_class3
                 ______________    ______________    ______________

Actual_class1         2000                 0                 0     
Actual_class2           34              1966                 0     
Actual_class3            0                 0              2000   



Multi-Class Confusion Matrix Output
                     TruePositive    FalsePositive    FalseNegative    TrueNegative
                     ____________    _____________    _____________    ____________

    Actual_class1        2000             34                0              3966    
    Actual_class2        1966              0               34              4000    
    Actual_class3        2000              0                0              4000    

The formula that I have used are:

Accuracy Of Each class=(TP ./total instances of that class)

( formula based on an answer here: Individual class accuracy calculation confusion)

Sensitivity=TP./TP+FN ;

The implementation of it in Matlab is:

acc_1  = 100*(cm_matrix(1,1))/sum(cm_matrix(1,:)) = 100*(2000)/(2000+0+0) = 100
acc_2  = 100*(cm_matrix(2,2))/sum(cm_matrix(2,:)) =  100*(1966)/(34+1966+0) = 98.3
acc_3  = 100*(cm_matrix(3,3))/sum(cm_matrix(3,:)) = 100*(2000)/(0+0+2000) = 100

sensitivity_1 = 2000/(2000+0)=1 = acc_1
sensitivity_2 =  1966/(1966+34) = 98.3 = acc_2
sensitivity_3 = 2000/2000 = 1 = acc_3

Question1) Is my formula for Accuracy of each class correct? For calculating accuracy of each individual class, say for positive class I should take the TP in the numerator. Similarly, for accuracy of only the negative class, I should consider TN in the numerator in the formula for accuracy. Is the same formula applicable to binary classification? Is my implementation of it correct?

Question2) Is my formula for sensitivity correct? Then how come I am getting same answer as individual class accuracies?

Sm1
  • 560
  • 2
  • 6
  • 24
  • 1
    Why do you doubt these formulas? What research have you done? How has your research led to your confusion, or at least failed to allay it? Has your application of these formulas failed to provide meaningful results? What is your actual question, because I'm about 95% sure what you posted isn't it. – beaker Mar 17 '20 at 19:41
  • Please see my updated question where I have explained in detail. The problem is the formula for overall class accuracy given everywhere is (TP +TN./ TP+FP+FN+TN). I could not find any reference to calculate formula for individual class accuracy for multi-class classification. Hence I had to borrow from the Matlab link. In the overall accuracy formula, the denominator has `TN` but for individual class accuracy, there should not be `TN` based on my understanding. I must have made a mistake with the formula for sensitivity of individual classes for the multi-class classification case. – Sm1 Mar 17 '20 at 19:54
  • Hence I posted since I don't have anywhere to find a reference for the multi-class case. – Sm1 Mar 17 '20 at 19:54
  • 1
    If you look at the Wikipedia link in your other question, your accuracy formula is wrong. It should be `TP+TN / TP+TN+FP+FN`. – beaker Mar 17 '20 at 20:36
  • @beaker: The formula that you have written is for calculating the accuracy for the whole confusion matrix: `number of correct prediction / total samples`. If one needs to calculate the individual class accuracies then one should perhaps only consider: `number of correct prediction for class1/number of samples in class` Similarly for the other classes. I think this formula can be extended to multi-class case as I finally found a toolbox. But there are 2 problems in that toolbox: https://www.mathworks.com/matlabcentral/fileexchange/60900-multi-class-confusion-matrix – Sm1 Mar 17 '20 at 23:26
  • (1) (1) if you could kindly see the second `switch case` under the function `getvalues` the formula for calculating individual class accuracy: there is a `for loop` and those variables are used `RefereceResult.AccuracyOfSingle=(TP ./ P)' = TP/TP+FN`; and another accuracy `accuracy=(TP)./(P+N);` So the denominator is different. I don't know why. (2) The formula for sensitivity is given the same as that of accuracy subsequently. – Sm1 Mar 17 '20 at 23:28
  • `TP/TP+FN` is the recall. I have no idea why Random Internet Guy would label it as accuracy. – beaker Mar 18 '20 at 01:47
  • @beaker: thank you for taking out the time to look at that code. Indeed the denominator for accuracy is incorrect. But if I work out using commonsense for accuracy of each class then that would be: correct predictions for that class/total instances belonging to that class. Coincidentally, my answer matches with the answers from running the code. However, sensitivity and accuracies for each class are coming out to be the same. This maybe another coincidence. I have shown the working in Question to show my point. If possible could you answer my Question. Are my implemetation & answers correct? – Sm1 Mar 18 '20 at 01:57
  • Totally get your anger & frustration. Sorry – Sm1 Mar 18 '20 at 01:58
  • @beaker: I would really appreciate an answer . Eve I thought that individual class accuracy to be similar to sensitivity or recall. But they are different as mentioned in this post: https://www.researchgate.net/post/Can_someone_help_me_to_calculate_accuracy_sensitivity_of_a_66_confusion_matrix The way the accuracy for each class is calculated matches to my calculation under the Matlab implementation code snippets. However, by some coincidence my individual class accuracy values are coming to be the same as recall eventhough the formula is different. – Sm1 Mar 18 '20 at 05:19

2 Answers2

3

Question1) Is my formula for Accuracy of each class correct?

No, the formula you're using is for the Sensitivity (Recall). See below.

For calculating accuracy of each individual class, say for positive class I should take the TP in the numerator. Similarly, for accuracy of only the negative class, I should consider TN in the numerator in the formula for accuracy. Is the same formula applicable to binary classification? Is my implementation of it correct?

Accuracy is the ratio of the number of correctly classified instances to the total number of instances. TN, or the number of instances correctly identified as not being in a class, are correctly classified instances, too. You cannot simply leave them out.

Accuracy is also normally only used for evaluating the entire classifier for all classes, not individual classes. You can, however, generalize the accuracy formula to handle individual classes, as done here for computing the average classification accuracy for a multiclass classifier. (See also the referenced article.)

The formula they use for each class is:

enter image description here

As you can see, it is identical to the usual formula for accuracy, but we only take into account the individual class's TP and TN scores (the denominator is still the total number of observations). Applying this to your data set, we get:

acc_1 = (2000+3966)/(2000+34+0+3966) = 0.99433
acc_2 = (1966+4000)/(1966+0+34+4000) = 0.99433
acc_3 = (2000+4000)/(2000+0+0+4000)  = 1.00000

This at least makes more intuitive sense, since the first two classes had mis-classified instances and the third did not. Whether these measures are at all useful is another question.


Question2) Is my formula for sensitivity correct?

Yes, Sensitivity is given as:

TP / TP+FN

which is the ratio of the instances correctly identified as being in this class to the total number of instances in the class. In a binary classifier, you are by default calculating the sensitivity for the positive class. The sensitivity for the negative class is the error rate (also called the miss rate or false negative rate in the wikipedia article) and is simply:

FN / TP+FN === 1 - Sensitivity

FN is nothing more than the TP for the negative class! (The meaning of TP is likewise reversed.) So it is natural to extend this to all classes as you have done.

Then how come I am getting same answer as individual class accuracies?

Because you're using the same formula for both.

Look at your confusion matrix:

cm_matrix = 
                predict_class1    predict_class2    predict_class3
                 ______________    ______________    ______________

Actual_class1         2000                 0                 0     
Actual_class2           34              1966                 0     
Actual_class3            0                 0              2000

TP for class 1 is obviously 2000

cm_matrix(1,1)

FN is the sum of the other two columns in that row. Therefore, TP+FN is the sum of row 1

sum(cm_matrix(1,:) 

That's exactly the formula you used for the accuracy.

acc_1  = 100*(cm_matrix(1,1))/sum(cm_matrix(1,:)) = 100*(2000)/(2000+0+0) = 100
beaker
  • 16,331
  • 3
  • 32
  • 49
  • Thank you for your answer & the link. However another answer here https://stackoverflow.com/questions/51255247/individual-class-accuracy-calculation-confusion puts a different formula (which I have used in multiclass classification) for individual class accuracy. He did not put `TN` in the numerator. He answered saying that the indivdual class accuracy is the `TP of that class/total instances in that class` The example was for binary classification. I thought the same is applicable to multi-class. – Sm1 Mar 18 '20 at 18:27
  • It was probably a mistake. You did not put TN in the numerator in your example, Cris did not put it in his. You'll have to ask him why he did it that way. Whenever you get advice from Random Internet People, you have to research it yourself and see if their advice makes sense. In this case, it does not, to me. By the way, **I** am a Random Internet Person, so you will have to check that **my** advice makes sense. But I'm not going to sit here and try to bat down every formula from every link that you can come up with. – beaker Mar 18 '20 at 18:34
  • True, I get your point I should do my research as well. Here is another link which puts down the same formula for individual class accuracy that I am harping about. The formula is being called as the User's accuracy http://gis.humboldt.edu/OLM/Courses/GSP_216_Online/lesson6-2/metrics.html The confusion matrix is flipped in comparison to mine so the formula is TP/column total where column total indicates number of instances of that class. – Sm1 Mar 18 '20 at 18:39
  • You didn't really just do that, did you? :-) – beaker Mar 18 '20 at 19:00
  • Please don't get me wrong, my intention is not to offend you or anyone. I am here to learn but I am not really convinced by both the answers and your point of view. Thus, I am just searching & researching to find the proper answer. – Sm1 Mar 18 '20 at 23:01
  • Good. Question everything. If you find an approach that seems to suit your situation better, great! Go ahead and post it, I'd love to learn something new. And if you have any questions about *my* reasoning, I'll be glad to clarify where I can. – beaker Mar 19 '20 at 00:10
2

Answer to question 1. It seems that accuracy is used only in binary classification, check this link. You refer to an answer on this site, but it concerns also a binary classification (i.e. classification into 2 classes only). You seem to have more than two classes, and in this case you should try something else, or a one-versus-all classification for each class (for each class, parse prediction for class_n and non_class_n).

Answer to question 2. Same issue, this measure is appropriate for binary classification which is not your case.

The formula for sensitivity is:

TP./(TP + FN)

The formula for accuracy is:

(TP)./(TP+FN+FP+TN)

See the documentation here.

UPDATE

And if you wish to use the confusion matrix, you have:

TP on the diagonal, at the level of the class FN the sum of all the values in the column of the class. In the function getvalues start counting lines from the declaration of the function and check lines 30 and 31:

TP(i)=c_matrix(i,i);
FN(i)=sum(c_matrix(i,:))-c_matrix(i,i);
FP(i)=sum(c_matrix(:,i))-c_matrix(i,i);
TN(i)=sum(c_matrix(:))-TP(i)-FP(i)-FN(i);

If you apply the accuracy formula, you obtain, after calculating and simplifying :

accuracy = c_matrix(i,i) / sum(c_matrix(:))

For the sensitivity you obtain, after simplifying:

sensitivity =  c_matrix(i,i) / sum(c_matrix(i,:))

If you want to understand better, just check the links I sent you.

Catalina Chircu
  • 1,506
  • 2
  • 8
  • 19
  • Thank you for answering. There must be a way to find the individual class accuracies & individual sensitivities by taking into consideration the diagonal element like in this example: https://www.mathworks.com/matlabcentral/fileexchange/60900-multi-class-confusion-matrix If you could kindly see the second `switch case` under the function `getvalues` you will see the formula for calculating individual class accuracy: there is a `for loop` and those variables are used `RefereceResult.AccuracyOfSingle=(TP ./ P)' = TP/TP+FN`; – Sm1 Mar 17 '20 at 23:20
  • Now this formula is the same as sensitivity which however differs from the one that you have. – Sm1 Mar 17 '20 at 23:21
  • Thank you for taking the time to go through that Matlab link. But my confusion was that there seems to be an error in the formula for sensitivity which is why the answer for sensitivity = accuracy for individual classes. Also could you please write out the mathematical formula for individual class accuracies as that code provides two formula & I don't know which one is correct: accuracy=`(TP)./(TP+FN+FP+TN)` or`(TP ./ TP+FN)` ? – Sm1 Mar 17 '20 at 23:47
  • Your accuracy formula is wrong but your sensitivity formula is right. Both formulas are correct in the Matlab code. You can also find these formulas in the link I sent you. Please click on it, it contains more detailed information. – Catalina Chircu Mar 18 '20 at 00:04
  • Can you please mention the correct formula for individual class accuracy. In that MAtlab code there are 2 formulae-which one you say is the correct? (TP)./(TP+FN+FP+TN) or(TP ./ TP+FN)? Also, using that formula accuracy and sensitivity values are coming the same as mentioned and worked out in my Question. Is this a coincidence then? – Sm1 Mar 18 '20 at 00:34
  • 1
    I already answered to that. Please read my answer carrefully. You have everything in it. – Catalina Chircu Mar 18 '20 at 09:40
  • 1
    @beaker: Updated. – Catalina Chircu Mar 18 '20 at 16:40
  • Thank you for updating your answer. The accuracy formula for each class that you have provided has in its denominator the total number of instances. I think the accuracy for each class should be number of correct predictions for that class/total instances for that class and not all the instances. Individual class accuracies = sensitivity for each class as mentioned by @beaker. Thus, the formula should be same. Can you please check. I looked into the links & based on that I think the denominator should be instances belonging to that class. – Sm1 Mar 18 '20 at 17:22
  • I think you should carefully read the documentation on this subject, in order to fully understand the concepts here. You may read the pages from the links I sent and other ones that you might find. beaker provided the same accuracy formula as myself, and inTP+TN / TP+TN+FP+FN, you have TP+TN+FP+FN=all_the_occurences. On the other hand, I see from your previous message that you found the answer, so I do not understand exactly in what do you need help. Please be moreprecise, or try to write a formula expressing your idea. – Catalina Chircu Mar 18 '20 at 17:37