0

I'm using the AdaBoostClassifier in Scikit-learn and always get an average probability of 0.5 regardless of how unbalanced the training sets are. The class predictions (predict_) seems to give correct estimates, but these aren't reflected in the predict_probas method which always average to 0.5.

If my "real" probability is 0.02, how do I transform the standardized probability to reflect that proportion?

1 Answers1

0

Do you mean you get probabilities per sample that are 1/n_classes on average? That's necessarily the case; the probabilities reported by predict_proba are the conditional class probability distribution P(y|X) over all values for y. To produce different probabilities, perform any necessary computations according to your probability model.

Fred Foo
  • 355,277
  • 75
  • 744
  • 836
  • Yes. In the NaiveBayes algorithm, there is apparently a class_prior parameter (for instance [0.2, 0.8]). This seems to be what I'm looking for, even though the AdaBoostClassifier doesn't allow it. Would I be right to just multiply the predict_proba response with the inverse of the class (1/0.2 or 1/0.8) to get a number corresponding to the class prior? – Ola Gustafsson Feb 08 '14 at 15:57
  • @OlaGustafsson You can multiply with whatever you want. If you renormalize afterwards, then what you have is a classifier with an additional prior, i.e. a kind of mixture model. – Fred Foo Feb 08 '14 at 17:58