-1

So I've got the following results from Naïves Bayes classification on my data set:

enter image description here

I am stuck however on understanding how to interpret the data. I am wanting to find and compare the accuracy of each class (a-g).

I know accuracy is found using this formula:

enter image description here

However, lets take the class a. If I take the number of correctly classified instances - 313 - and divide it by the total number of 'a' (4953) from the row a, this gives ~6.32%. Would this be the accuracy?

EDIT: if we use the column instead of the row, we get 313/1199 which gives ~26.1% which seems a more reasonable number.

EDIT 2: I have done a calculation of the accuracy of a in excel which gives me 84% as the accuracy, using the accuracy calculation shown above:

enter image description here

This doesn't seem right, as the overall accuracy of classification successfully is ~24%

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
rshah
  • 675
  • 2
  • 12
  • 32

1 Answers1

1

No -- all you've calculated is tp/(tp+fn), the total correct identifications of class a, divided by the total of actual a examples. This is recall, not accuracy. You need to include the other two figures.

fp is the rest of the a column; tn is all of the other figures in the non-a rows and columns, the 6x6 sub-matrix. This will reduce all 35K+ trials to a 2x2 matrix with labels a and not a, the 2x2 confusion matrix with which you're already familiar.

Yes, you get to repeat that reduction for each of the seven features. I recommend doing it programmatically.


RESPONSE TO OP UPDATE

Your accuracy is that high: you have a huge quantity of true negatives, not-a samples that were properly classified as not-a.

Perhaps it doesn't feel right because our experience focuses more on the class in question. There are [other statistics that handle that focus.

  • Recall is tp / (tp+fn) -- of all items actually in class a, what percentage did we properly identify? This is the 6.32% figure.
  • Precision is tp / (tp + fp) -- of all items identified as class a, what percentage were actually in that class. This is the 26.1% figure you calculated.
Prune
  • 76,765
  • 14
  • 60
  • 81
  • 1
    How would I go about doing this programmatically? – rshah Nov 03 '17 at 15:46
  • You sum the appropriate row and column slices from the 7x7 confusion matrix. – Prune Nov 03 '17 at 16:02
  • 1
    How are you operating Weka at the moment? If you're running it manually in the Explorer interface, right-click the result in the result list and choose **Save result buffer**, then import the file as a space-delimited file into a spreadsheet program where you can do the calculation. – nekomatic Nov 03 '17 at 16:23
  • How would I go about automating the calculations for all the classes `a-g` in excel? I've copied the confusion matrix into the columns – rshah Nov 03 '17 at 18:39
  • (1) This is a separate question; (2) you haven't posted your current data and coding attempt; (3) without that ... well, Stack Overflow is not a from-scratch coding service. – Prune Nov 03 '17 at 18:42
  • @Prune okay I've done my calculations in excel; and I got 84% accuracy for a, but that doesn't seem right since the overall successful classification is ~24%? Edited OP to include excel snippet – rshah Nov 03 '17 at 18:46