0

I use Weka platform. I am working on an imbalanced dataset, and the majority class is the positive class. I aim to apply different classifiers and evaluate their performance by using several evaluation metrics including AUC.

My question is: Are there special procedures that should be done because the positive class is the majority class?

Muneera
  • 11
  • 2
  • I think this question is off-topic here, but please check this post [Binary classification with strongly unbalanced classes](https://stats.stackexchange.com/questions/235808/binary-classification-with-strongly-unbalanced-classes) and this paper:[Learning from Imbalanced Classes](http://www.svds.com/learning-imbalanced-classes/). Also [Should I balance my dataset for binary classification?](https://stats.stackexchange.com/questions/502890/should-i-balance-my-dataset-for-binary-classification), and [Binary classification in imbalanced data](https://stats.stackexchange.com/a/123623/240550) – Mario Jan 08 '23 at 22:14
  • based on these statements 1)The Area Under the ROC curve (AUC) is a good general statistic. It is equal to the probability that a random positive example will be ranked above a random negative example 2)AUC Measures the likelihood that given two random points—one from the positive and one from the negative class—the classifier will rank the point from the positive class higher than the one from the negative one (it measures the performance of the ranking really) Is AUC inappropriate to use if the positive class is the majority class because it ranks a positive point higher than a negative one? – Muneera Jan 09 '23 at 15:43
  • Actually there two points about *evaluation*: first point is it has been proved that for binary classification and anomaly detection tasks PR curve is more informative that ROC curve [Ref](https://stats.stackexchange.com/a/267283/240550) and also it depends on your chosen [approach](https://stats.stackexchange.com/a/133385/240550) and [options](https://stats.stackexchange.com/a/298862/240550). The 2nd point is you might decide which class is target/positive class (majority class or minor one?). In such anomaly detection tasks over imbalanced data, the art is to find anomalies or minor class. – Mario Jan 09 '23 at 16:33

1 Answers1

0

The FAQ I have unbalanced data now what on the Weka wiki suggests to either:

fracpete
  • 2,448
  • 2
  • 12
  • 17
  • Thank you. I aim to balance my dataset using oversampling........ However, I was afraid that due to the positive class is the majority class, not as usual that the positive class is the minority class, I was afraid that dealing with this case should be done by specific methods because all papers and articles, which I have read, dealt with imbalanced datasets (their positive class is the minority class) – Muneera Jan 09 '23 at 04:54