0

Let's assume that we have a classification problem with 3 classes and that we have highly imbalanced data. Let's say in class 1 we have 185 data points, in class 2 199 and in class 3 720.

For calculating the AUC on a multiclass problem there is the macro-average (giving equal weight to the classification of each label) and micro-average method (considering each element of the label indicator matrix as a binary predictio) as written in the scikit-learn tutorial.

For such imbalanced dataset should micro-averaging or macro-averaging of AUC be used?

I'm unsure because when we have a confusion matrix as shown below, I'm getting a micro-averaged AUC of 0.76 and a macro-averaged AUC of 0.55.

enter image description here

BlackHawk
  • 719
  • 1
  • 6
  • 18
  • 3
    I'm voting to close this question as off-topic because it is not about programming. – desertnaut Aug 26 '18 at 18:40
  • 1
    micro-average should be the recommended one for imbalanced dataset, but there seems to be some inconsistency with the example data you provided vs, the confusion matrix, e.g., for class 1, the number of data points (first row) in the confusion matrix does not sum to 200, likewise for class 2 and 3. – Sandipan Dey Aug 26 '18 at 19:20
  • @SandipanDey Thank you very much for your answer. I have updated the quesiton regarding the number of data points. But why do I get for micro-averaging a so much higher value than for macro-averaging for this confusion matrix? – BlackHawk Aug 26 '18 at 19:45
  • I’m voting to close this question because it belongs to https://datascience.stackexchange.com/ – jopasserat Jul 28 '21 at 19:43

1 Answers1

3

Since you have the class with majority number of data points classified with much higher precision, the overall precision computed with micro-average is going to be higher than the same computed with macro-average.

Here, P1 = 12/185 = 0.06486486, P2 = 11/199 = 0.05527638, P3 = 670 / 720 = 0.9305556

overall precision with macro-average = (P1 + P2 + P3) / 3 = 0.3502323, which is much less than overall precision with micro-average = (12+11+670)/(185+199+720) = 0.6277174.

Same holds true for AUC.

Sandipan Dey
  • 21,482
  • 2
  • 51
  • 63
  • I see. But precision values are both high and similar for macro-average and micro-average but macro-average AUC is just at change level. – BlackHawk Aug 26 '18 at 20:37
  • So which should be the prefered metric to report? Micro-averaged or macro-averaged AUC? – BlackHawk Aug 26 '18 at 20:38
  • 1
    By the way, should P1 = 12 / 185 and P2 = 19 / 199? – BlackHawk Aug 26 '18 at 20:41
  • If your goal is to have a final evaluation metric that captures how many datapoints are classified correctly irrespective of the class labels, then the mirco should be the one to go with I think, but as seen in this example it is likely to be biased towards the evaluation results for the majority class. – Sandipan Dey Aug 26 '18 at 21:04
  • To the contrary if your goal is to evaluate the classification for each class level then macro is the one to go with, but being an average, it will biased towards the extreme values (as it was biased to the low precision values in this example) – Sandipan Dey Aug 26 '18 at 21:09
  • So, micro average is biased by the amount of class imbalance? But why is then micro-averaging recommended for imbalanced data? – BlackHawk Aug 26 '18 at 21:17
  • micro is recommended when we have data in the other extreme: let's take the example of fraud-detection, where the majority instances are non-fraud and if a classfier has a high precision on the non-fraud and poor precision on the fraud instances, then macro will wrongly give an indication of overall good precision. – Sandipan Dey Aug 26 '18 at 21:24