0

So I'm using scikit-learn to do some binary classification, and right now I'm trying the Logistic Regression classifier. After training the classifier, I print out the classification results and the probabilities they are in each class:

logreg = LogisticRegression()
logreg.fit(X_train,y_train)
print logreg.predict(X_test)
print logreg.predict_proba(X_test)

and so I get something like:

[-1 1 1 -1 1 -1...-1]
[[  8.64625237e-01   1.35374763e-01]
 [  3.57441028e-01   6.42558972e-01]
 [  1.67970096e-01   8.32029904e-01]
 [  9.20026249e-01   7.99737513e-02]
 [  1.20456011e-02   9.87954399e-01]
 [  6.48565595e-01   3.51434405e-01]...]

etc...and so it looks like whenever the probability exceeds 0.5, that's what the object is classified as. I'm looking for a way to adjust this number so that, for example, the probability to be in class 1 must exceed .7 to be classified as such. Is there a way to do this? I was looking at some parameters already like 'tol' and 'weight' but I wasn't sure if they were what I was looking for or if they were working...

MrDinkleburg
  • 109
  • 1
  • 9
  • Possible duplicate of http://stackoverflow.com/questions/31417487/sklearn-logisticregression-and-changing-the-default-threshold-for-classification – Abhinav Arora Jul 27 '16 at 18:25
  • If you get the predicted probabilities (as you did), it's very easy to do something (probas[:,1]>=threshold).astype(int) – Stergios Jul 29 '16 at 10:48

1 Answers1

0

You can set your THRESHOLD like this

THRESHOLD = 0.7
preds = np.where(logreg.predict_proba(X_test)[:,1] > THRESHOLD, 1, 0)

Please refer to sklearn LogisticRegression and changing the default threshold for classification

J. Doe
  • 3,458
  • 2
  • 24
  • 42