1

In Logistic Regression for binary classification, while using predict(), how does the classifier decide for the class (1/0)?

Is it based on the probability threshold, if >0.5 then 1 else 0? If so, can this threshold be manually changed?

I know we get probabilities from predict_prob(), but i was curious about predict() function!

Dwight Gunning
  • 2,485
  • 25
  • 39
neerdy30
  • 21
  • 5

1 Answers1

1

Logistic Regression, like other classification models, returns a probability for each class. Being a binary predictor, it has only two classes.

From the source code, predict() returns the class with the highest class probability.

def predict(self, X):
    """Predict class labels for samples in X.
    Parameters
    ----------
    X : {array-like, sparse matrix}, shape = [n_samples, n_features]
        Samples.
    Returns
    -------
    C : array, shape = [n_samples]
        Predicted class label per sample.
    """
    scores = self.decision_function(X)
    if len(scores.shape) == 1:
        indices = (scores > 0).astype(np.int)
    else:
        indices = scores.argmax(axis=1)
    return self.classes_[indices]

So yes, in this case it returns the class with a probability greater than 50%, since the sum of the class probabilities = 1.

pault
  • 41,343
  • 15
  • 107
  • 149
  • Okay, Thanks. So where does maximum likelihood estimation come into picture? I'm sorry, if its totally unrelated to it. – neerdy30 Dec 14 '17 at 16:42
  • That's a question better suited to Math Overflow. Here's the [wiki page](https://en.wikipedia.org/wiki/Logistic_regression#Maximum_likelihood_estimation). – pault Dec 14 '17 at 16:44
  • MLE is relevant in trying to calculate the coefficients of the model, based on some trained data. Predictions are then made on new data, using the model coefficients. – ilanman Dec 20 '17 at 14:31