0

Simply put, I have an imbalanced data set and I have to train using partial_fit because of the size. Some of the methods I use can't naturally handle multiclass problems, so I am wrapping them with a OneVsRestClassifier. However, this causes problems adjusting the class_weights.

Because I use partial_fit the option of setting class_weight='balanced' or 'auto' is not possible.

Is there a simple solution where I can use OneVsRestClassifier and still weight my classes?

Example code:

model = OneVsRestClassifier(SGDClassifier(loss='log', alpha=1e-4, penalty='elasticnet', class_weight=weights))
model.partial_fit(features, labels_i, classes=np.unique(labels_i))

where weights is a dictionary containing the weights for each class.

Frank
  • 619
  • 1
  • 6
  • 26
  • You could upsample your minority class or set `sample_weight` in the `partial_fit` to reflect your class imbalance. – piman314 Dec 05 '17 at 10:59
  • 1
    Also you don't need to wrap most of the classifiers in `OneVsRestClassifier` it's handled implicitly by `sklearn`. – piman314 Dec 05 '17 at 11:02
  • 1
    Adding to @ncfirth 's comment, please see here to know how scikit handles multi-class and multi-label problems implicitly :- http://scikit-learn.org/stable/modules/multiclass.html – Vivek Kumar Dec 05 '17 at 11:46
  • @ncfirth In my case I need `OneVsRestClassifier`, as I use the methods `SGDClassifier`, `Perceptron` and `PassiveAggressiveClassifier` that all three are listed as multiclass together with `OneVsRestClassifier`. I have chosen these as they can handle sparse matrices and has `partial_fit`. Also, `sample_weight` is only working if `class_weight` has been set, which I am unable to do because of `OneVsRestClassifier`. I was mostly looking for an option not using oversampling, or something I can use together with it, as I would have to upsample 1000x, or more, for some classes. – Frank Dec 05 '17 at 12:27
  • 1
    Thats what @ncfirth is saying. You dont need to wrap. Its written on the page I posted above, that they will handle multiclass situation by using `ovr` strategy internally. No need for you to do it. – Vivek Kumar Dec 05 '17 at 12:51

0 Answers0