Questions tagged [logistic-regression]

Logistic regression is a statistical classification model used for making categorical predictions.

Logistic regression is a statistical analysis method used for predicting and understanding categorical dependent variables (e.g., true/false, or multinomial outcomes) based on one or more independent variables (e.g., predictors, features, or attributes). The probabilities describing the possible outcomes of a single trial are modeled as a function of the predictors using a logistic function (as it follows):

enter image description here

A logistic regression model can be represented by:

enter image description here

The logistic regression model has the nice property that the exponentiated regression coefficients can be interpreted as odds ratios associated with a one unit increase in the predictor.

Multinomial logistic regression (i.e., with three or more possible outcomes) are also sometimes called Maximum Entropy (MaxEnt) classifiers in the machine learning literature.


Tag usage

Questions on should be about implementation and programming problems, not about the statistical or theoretical properties of the technique. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics, machine learning and data analysis.

3746 questions
32
votes
4 answers

predict_proba for a cross-validated model

I would like to predict the probability from Logistic Regression model with cross-validation. I know you can get the cross-validation scores, but is it possible to return the values from predict_proba instead of the scores? # imports from…
29
votes
3 answers

plotting decision boundary of logistic regression

I'm implementing logistic regression. I managed to get probabilities out of it, and am able to predict a 2 class classification task. My question is: For my final model, I have weights and the training data. There are 2 features, so my weight is a…
user2773013
  • 3,102
  • 8
  • 38
  • 58
27
votes
2 answers

How to increase the model accuracy of logistic regression in Scikit python?

I am trying to predict the admit variable with predictors such as gre,gpa and ranks. But the prediction accuracy is very low (0.66).The dataset is given below. https://gist.github.com/abyalias/3de80ab7fb93dcecc565cee21bd9501a The first few rows of…
26
votes
4 answers

Difference between logistic regression and softmax regression

I know that logistic regression is for binary classification and softmax regression for multi-class problem. Would it be any differences if I train several logistic regression models with the same data and normalize their results to get a…
24
votes
2 answers

How to handle date variable in machine learning data pre-processing

I have a data-set that contains among other variables the time-stamp of the transaction in the format 26-09-2017 15:29:32. I need to find possible correlations and predictions of the sales (lets say in logistic regression). My questions are: How to…
24
votes
3 answers

Python: DeprecationWarning: elementwise == comparison failed; this will raise an error in the future

I came across a problem with comparing the predictions of my model with the labels of training set. The arrays I'm using have shapes: Training set (200000, 28, 28) (200000,) Validation set (10000, 28, 28) (10000,) Test set (10000, 28, 28)…
Arslan
  • 400
  • 1
  • 3
  • 10
24
votes
1 answer

Can multinomial models be estimated using Generalized Linear model?

In analysis of categorical data, we often use logistic regression to estimate relationships between binomial outcomes and one or more covariates. I understand this is a type of generalized linear model (GLM). In R, this is implemented with the glm…
hxd1011
  • 885
  • 2
  • 11
  • 23
22
votes
5 answers

Scikit: calculate precision and recall using cross_val_score function

I'm using scikit to perform a logistic regression on spam/ham data. X_train is my training data and y_train the labels('spam' or 'ham') and I trained my LogisticRegression this way: classifier = LogisticRegression() classifier.fit(X_train,…
19
votes
3 answers

What do columns ‘rawPrediction’ and ‘probability’ of DataFrame mean in Spark MLlib?

After I trained a LogisticRegressionModel, I transformed the test data DF with it and get the prediction DF. And then when I call prediction.show(), the output column names are: [label | features | rawPrediction | probability | prediction]. I know…
Hereme
  • 193
  • 1
  • 1
  • 5
19
votes
4 answers

Large fixed effects binomial regression in R

I need to run a logistic regression on a relatively large data frame with 480.000 entries with 3 fixed effect variables. Fixed effect var A has 3233 levels, var B has 2326 levels, var C has 811 levels. So all in all I have 6370 fixed effects. The…
Phil
  • 954
  • 1
  • 8
  • 22
19
votes
2 answers

How to perform logistic regression using vowpal wabbit on very imbalanced dataset

I am trying to use vowpal wabbit for logistic regression. I am not sure if this is the right syntax to do it For training, I do ./vw -d ~/Desktop/new_data.txt --passes 20 --binary --cache_file cache.txt -f lr.vw --loss_function logistic --l1…
user34790
  • 2,020
  • 7
  • 30
  • 37
19
votes
2 answers

Logistic regression - defining reference level in R

I am going nuts trying to figure this out. How can I in R, define the reference level to use in a binary logistic regression? What about the multinomial logistic regression? Right now my code is: logistic.train.model3 <- glm(class~ x+y+z, …
blast00
  • 559
  • 2
  • 8
  • 18
18
votes
2 answers

What is C parameter in sklearn Logistic Regression?

What is the meaning of C parameter in sklearn.linear_model.LogisticRegression? How does it affect the decision boundary? Do high values of C make the decision boundary non-linear? How does overfitting look like for logistic regression if we…
18
votes
2 answers

Getting a low ROC AUC score but a high accuracy

Using a LogisticRegression class in scikit-learn on a version of the flight delay dataset. I use pandas to select some columns: df = df[["MONTH", "DAY_OF_MONTH", "DAY_OF_WEEK", "ORIGIN", "DEST", "CRS_DEP_TIME", "ARR_DEL15"]] I fill in NaN values…
Jon
  • 2,644
  • 1
  • 22
  • 31
18
votes
3 answers

Cost function in logistic regression gives NaN as a result

I am implementing logistic regression using batch gradient descent. There are two classes into which the input samples are to be classified. The classes are 1 and 0. While training the data, I am using the following sigmoid function: t = 1 ./ (1 +…