9

I have problem about calculating the precision and recall for classifier in matlab. I use fisherIris data (that consists of 150 datapoints, 50-setosa, 50-versicolor, 50-virginica). I have classified using kNN algorithm. Here is my confusion matrix:

50     0     0
 0    48     2
 0     4    46

correct classification rate is 96% (144/150), but how to calculate precision and recall using matlab, is there any function? I know the formulas for that precision=tp/(tp+fp),and recall=tp/(tp+fn), but I am lost in identifying components. For instance, can I say that true positive is 144 from the matrix? what about false positive and false negative? Please help!!! I would really appreciate! Thank you!

user2314737
  • 27,088
  • 20
  • 102
  • 114
user19565
  • 155
  • 1
  • 2
  • 9
  • sorry sorry ,we are talking different one –  Apr 07 '14 at 14:21
  • How do you get to 144? – Dan Apr 07 '14 at 14:24
  • I have got this number by summing up the diagonal of confusion matrix, 50+48+46, considering as correctly classified data – user19565 Apr 07 '14 at 14:28
  • you have 3 classes? Are you sure precision and recall generalize to classification with more than 2 classes? – Dan Apr 07 '14 at 14:36
  • Yes, I have three classes, I think I have seen similar paper considering 12 classes that gives information on precision and recall for classification overall evaluation. "Trabelsi, D., Mohammed, S., Chamroukhi, F., Oukhellou, L. and Amirat, Y., An unsupervised approach for automatic activity recognition based on hidden Markov model regression (2013), in: IEEE Transactions on Automation Science and Engineering, Accepted as regular paper, DOI: 10.1109/TASE.2013.2256349" – user19565 Apr 07 '14 at 14:43
  • 1
    @user19565 http://stats.stackexchange.com/questions/51296/how-to-calculate-precision-and-recall-for-multiclass-classification-using-confus – Dan Apr 07 '14 at 14:58
  • Thank you for giving me right direction! – user19565 Apr 07 '14 at 15:39
  • In pattern recognition and information retrieval with binary classification, precision (also called positive predictive value) is the fraction of retrieved instances that are relevant, while recall (also known as sensitivity) is the fraction of relevant instances that are retrieved[1] source Wikipedia,, I am not sure how would you apply for multi class scenario but my hypothesis is reporting on a multi-class one vs all setting. – Creative_Cimmons Oct 15 '14 at 10:55
  • look at [`perfcurve`](http://www.mathworks.com/help/stats/perfcurve.html) – Shai Mar 12 '15 at 15:06

4 Answers4

8

To add to pederpansen's answer, here are some anonymous Matlab functions for calculating precision, recall and F1-score for each class, and the mean F1 score over all classes:

precision = @(confusionMat) diag(confusionMat)./sum(confusionMat,2);

recall = @(confusionMat) diag(confusionMat)./sum(confusionMat,1)';

f1Scores = @(confusionMat) 2*(precision(confusionMat).*recall(confusionMat))./(precision(confusionMat)+recall(confusionMat))

meanF1 = @(confusionMat) mean(f1Scores(confusionMat))
Shane Halloran
  • 318
  • 4
  • 9
  • 1
    Just a warning, the usage of confusionMat in precision and recall here is wrong, it should be transposed (row index representing real label index). Simple example where everything is predicted as label 2: recall([0, 5; 0, 5]) = [NaN, 0.5], while recall([0, 5; 0, 5].') = [0 1] returns the correct result. Otherwise, you will get wrong results. – vls Jul 11 '17 at 14:55
3

As Dan pointed out in his comment, precision and recall are usually defined for binary classification problems only.

But you can calculate precision and recall separately for each class. Let's annotate your confusion matrix a little bit:

          |                  true           |
          |      |  seto  |  vers  |  virg  |
          -----------------------------------
          | seto |   50        0        0
predicted | vers |    0       48        2
          | virg |    0        4       46

Here I assumed the usual convention holds, i.e. columns are used for true values and rows for values predicted by your learning algorithm. (If your matrix was built the other way round, simply take the transpose of the confusion matrix.)

The true positives (tp(i)) for each class (=row/column index) i is given by the diagonal element in that row/column. The true negatives (tn) then are given by the sum of the remaining diagonal elements. Note that we simply define the negatives for each class i as "not class i".

If we define false positives (fp) and false negatives (fn) analogously as the sum of off-diagonal entries in a given row or column, respectively, we can calculate precision and recall for each class:

precision(seto) = tp(seto) / (tp(seto) + fp(seto)) = 50 / (50 + (0 + 0)) = 1.0
precision(vers) = 48 / (48 + (0 + 2)) = 0.96
precision(virg) = 46 / (46 + (0 + 4)) = 0.92

recall(seto) = tp(seto) / (tp(seto) + fn(seto)) = 50 / (50 + (0 + 0)) = 1.0
recall(vers) = 48 / (48 + (0 + 4)) = 0.9231
recall(virg) = 46 / (46 + (0 + 2)) = 0.9583

Here I used the class names instead of the row indices for illustration.

Please have a look at the answers to this question for further information on performance measures in the case of multi-class classification problems - particularly if you want to end up with single number instead of one number for each class. Of course, the easiest way to do this is just averaging the values for each class.

Update

I realized that you were actually looking for a Matlab function to do this. I don't think there is any built-in function, and on the Matlab File Exchange I only found a function for binary classification problems. However, the task is so easy you can easily define your own functions like so:

function y = precision(M)
  y = diag(M) ./ sum(M,2);
end

function y = recall(M)
  y = diag(M) ./ sum(M,1)';
end

This will return a column vector containing the precision and recall values for each class, respectively. Now you can simply call

>> mean(precision(M))

ans =

    0.9600

>> mean(recall(M))

ans =

    0.9605

to obtain the average precision and recall values of your model.

Community
  • 1
  • 1
jurgispods
  • 745
  • 8
  • 19
1

use the following matab code

   actual = ...
   predicted= ...
   cm = confusionmat(actual,predicted);
   cm = cm';
   precision = diag(cm)./sum(cm,2);
   overall_precision = mean(precision)
   recall= diag(cm)./sum(cm,1)';
   overall_recall = mean(recall)
0

Another approach

   confMat=[50,0,0;0,48,2;0,4,46];

for i =1:size(confMat,1)
    precision(i)=confMat(i,i)/sum(confMat(i,:)); 
end
precision(isnan(precision))=[];
Precision=sum(precision)/size(confMat,1);

for i =1:size(confMat,1)
    recall(i)=confMat(i,i)/sum(confMat(:,i));  
end

Recall=sum(recall)/size(confMat,1);

F_score=2*Recall*Precision/(Precision+Recall);
mpx
  • 3,081
  • 2
  • 26
  • 56