Questions tagged [logistic-regression]

Logistic regression is a statistical classification model used for making categorical predictions.

Logistic regression is a statistical analysis method used for predicting and understanding categorical dependent variables (e.g., true/false, or multinomial outcomes) based on one or more independent variables (e.g., predictors, features, or attributes). The probabilities describing the possible outcomes of a single trial are modeled as a function of the predictors using a logistic function (as it follows):

enter image description here

A logistic regression model can be represented by:

enter image description here

The logistic regression model has the nice property that the exponentiated regression coefficients can be interpreted as odds ratios associated with a one unit increase in the predictor.

Multinomial logistic regression (i.e., with three or more possible outcomes) are also sometimes called Maximum Entropy (MaxEnt) classifiers in the machine learning literature.


Tag usage

Questions on should be about implementation and programming problems, not about the statistical or theoretical properties of the technique. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics, machine learning and data analysis.

3746 questions
12
votes
1 answer

Why does TensorFlow's documentation call a softmax's input "logits"?

TensorFlow calls each of the inputs to a softmax a logit. They go on to define the softmax's inputs/logits as: "Unscaled log probabilities." Wikipedia and other sources say that a logit is the log of the odds, and the inverse of the sigmoid/logistic…
12
votes
1 answer

Evaluating Logistic regression with cross validation

I would like to use cross validation to test/train my dataset and evaluate the performance of the logistic regression model on the entire dataset and not only on the test set (e.g. 25%). These concepts are totally new to me and am not very sure if…
S.H
  • 137
  • 1
  • 1
  • 10
12
votes
1 answer

xgboost binary logistic regression

I am having problems running logistic regression with xgboost that can be summarized on the following example. Lets assume I have a very simple dataframe with two predictors and one target variable: df= pd.DataFrame({'X1' : pd.Series([1,0,0,1]),…
12
votes
2 answers

Comparison of R, statmodels, sklearn for a classification task with logistic regression

I have made some experiments with logistic regression in R, python statmodels and sklearn. While the results given by R and statmodels agree, there is some discrepency with what is returned by sklearn. I would like to understand why these results…
jfb
  • 679
  • 6
  • 9
12
votes
1 answer

Creating a sklearn.linear_model.LogisticRegression instance from existing coefficients

Can one create such an instance based on existing coefficients which were calculated say in a different implementation (e.g. Java)? I tried creating an instance then setting coef_ and intercept_ directly and it seems to work but I'm not sure if…
jonathans
  • 320
  • 3
  • 9
12
votes
4 answers

Regularized logistic regression code in matlab

I'm trying my hand at regularized LR, simple with this formulas in matlab: The cost function: J(theta) = 1/m*sum((-y_i)*log(h(x_i)-(1-y_i)*log(1-h(x_i))))+(lambda/2*m)*sum(theta_j) The gradient: ∂J(theta)/∂theta_0 = [(1/m)*(sum((h(x_i)-y_i)*x_j)]…
Pedro.Alonso
  • 1,007
  • 3
  • 20
  • 41
11
votes
1 answer

How to split data based on a column value in sklearn

I have a data file with following columns 'customer', 'calibrat' - Calibration sample = 1; Validation sample = 0; 'churn', 'churndep', 'revenue', 'mou', Data file contains some 40000 rows out of which 20000 have value for calibrat as 1. I want to…
11
votes
1 answer

How to implement polynomial logistic regression in scikit-learn?

I'm trying to create a non-linear logistic regression, i.e. polynomial logistic regression using scikit-learn. But I couldn't find how I can define a degree of polynomial. Did anybody try it? Thanks a lot!
Inna
  • 663
  • 8
  • 12
11
votes
3 answers

Logistic Regression on factor: Error in eval(family$initialize) : y values must be 0 <= y <= 1

Not able to fix the below error for the below logistic regression training=(IBM$Serial<625) data=IBM[!training,] stock.direction <- data$Direction training_model=glm(stock.direction~data$lag2,data=data,family=binomial) ###Error### ---- Error in…
Akhil Doppalapudi
  • 153
  • 1
  • 1
  • 4
11
votes
1 answer

Neural Network (No hidden layers) vs Logistic Regression?

I've been taking a class on neural networks and don't really understand why I get different results from the accuracy score from logistic regression, and a two layer neural network (input layer and output layer). The output layer is using the…
11
votes
2 answers

Using cross validation and AUC-ROC for a logistic regression model in sklearn

I'm using the sklearn package to build a logistic regression model and then evaluate it. Specifically, I want to do so using cross validation, but can't figure out the right way to do so with the cross_val_score function. According to the…
11
votes
1 answer

glmnet: How do I know which factor level of my response is coded as 1 in logistic regression

I have a logistic regression model that I made using the glmnet package. My response variable was coded as a factor, the levels of which I will refer to as "a" and "b". The mathematics of logistic regression label one of the two classes as "0" and…
John Kleve
  • 499
  • 1
  • 4
  • 12
11
votes
3 answers

statsmodels logistic regression odds ratio

I'm wondering how can I get odds ratio from a fitted logistic regression models in python statsmodels. >>> import statsmodels.api as sm >>> import numpy as np >>> X = np.random.normal(0, 1, (100, 3)) >>> y = np.random.choice([0, 1], 100) >>> res =…
Donbeo
  • 17,067
  • 37
  • 114
  • 188
11
votes
2 answers

How to fix Statsmodel warning: "Maximum no. of iterations has exceeded"

I am using Anaconda and I am trying logistic regression. After loading training data set and performed the regression. Then I got the following warning message. train_cols = data.columns[1:] logit = sm.Logit(data['harmful'],…
dave
  • 141
  • 1
  • 3
  • 8
11
votes
3 answers

How can I use multi cores processing to run glm function faster

I'm a bit new to r and I would like to use a package that allows multi cores processing in order to run glm function faster.I wonder If there is a syntax that I can use for this matter. Here is an example glm model that I wrote, can I add a…
mql4beginner
  • 2,193
  • 5
  • 34
  • 73