0

I noticed that my f-scores are slightly lower when using SK-learn's LogisticRegression classifier in conjunction with the following one-vs-rest classifier than using it by itself to do multi-class classification.

class MyOVRClassifier(sklearn.OneVsRestClassifier):
    """
    This OVR classifier will always choose at least one label,
    regardless of the probability
    """
    def predict(self, X):
        probs = self.predict_proba(X)[0]
        p_max = max(probs)
        return [tuple([self.classes_[i] for i, p in enumerate(probs) if p == p_max ])]

Since the documentation of the logistic regression classifier states it uses a one-vs-all strategy, I'm wondering what factors could account for the difference in performance. My one-vs-rest LR classifier seems to be over-predicting one of the classes more than the LR classifier does on its own.

Nathan Breit
  • 1,661
  • 13
  • 33

1 Answers1

1

Just guessing, but probably when "no one votes" you get many tinny floating point values, and with LR you end up underflowing to zero. So instead of picking the person most confident / closest, you end up picking based on tie-breaking zero. See an example here of the difference.

Raff.Edward
  • 6,404
  • 24
  • 34