2

How does sklearn's Logistic Regression handle class imbalance resulting from OVR (one vs rest) multiclass handling scheme?

In SciKit-Learn library, there is a LogisticRegression API providing to you.

Reference: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html

One of the parameters of this API is multi_class (default value is 'auto')

If I change 'auto' to 'ovr', it means that use one-verse-the-rest method to train a model for the multi-class problem.

When using ovr, this strategy consists in fitting one classifier per class

Reference: https://scikit-learn.org/stable/modules/generated/sklearn.multiclass.OneVsRestClassifier.html

For example, my dataset consists of 10 class (number of class is evenly distributed)

ovr will train 10 classifier to me.

First: Class A vs Class not A Second: Class B vs Class not B ....

My question is, how SciKit learn handle imbalanced data since when training first classifier, number of not A > Class A (around 9:1) ???

Shihab Shahriar Khan
  • 4,930
  • 1
  • 18
  • 26
  • 2
    since you are a new user, allow me to advise a little: put the main question as early as possible, then provide details. Lots of text tends to scare away people who can potentially answer – Shihab Shahriar Khan Apr 04 '20 at 09:54
  • 1
    You can check out the source code directly. It often helps more than you expect: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/multiclass.py#L133 – emremrah Apr 04 '20 at 11:18

0 Answers0