Questions tagged [logistic-regression]

Logistic regression is a statistical classification model used for making categorical predictions.

Logistic regression is a statistical analysis method used for predicting and understanding categorical dependent variables (e.g., true/false, or multinomial outcomes) based on one or more independent variables (e.g., predictors, features, or attributes). The probabilities describing the possible outcomes of a single trial are modeled as a function of the predictors using a logistic function (as it follows):

enter image description here

A logistic regression model can be represented by:

enter image description here

The logistic regression model has the nice property that the exponentiated regression coefficients can be interpreted as odds ratios associated with a one unit increase in the predictor.

Multinomial logistic regression (i.e., with three or more possible outcomes) are also sometimes called Maximum Entropy (MaxEnt) classifiers in the machine learning literature.


Tag usage

Questions on should be about implementation and programming problems, not about the statistical or theoretical properties of the technique. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics, machine learning and data analysis.

3746 questions
7
votes
1 answer

Problems setting up conditional search space in hyperopt

I'll fully admit that I may be setting up the conditional space wrong here but for some reason, I just can't get this to function at all. I am attempting to use hyperopt to tune a logistic regression model and depending on the solver there are some…
7
votes
1 answer

R geepack: unreasonably large estimates using GEE

I am using geepack for R to estimate logistic marginal model by geeglm(). But I am getting garbage estimates. They about 16 orders of magnitude too large. However the p-values seems to similar to what I expected. This means that the response…
Mikkel Rev
  • 863
  • 3
  • 12
  • 31
7
votes
2 answers

Cross Validation function for logistic regression in R

I Come from a predominantly python + scikit learn background, and I was wondering how would one obtain the cross validation accuracy for a logistic regression model in R? I was searching and surprised that there's no easy way to this. I'm looking…
John Bennet
  • 81
  • 1
  • 1
  • 2
7
votes
4 answers

How to predict new values using statsmodels.formula.api (python)

I trained the logistic model using the following, from breast cancer data and ONLY using one feature 'mean_area' from statsmodels.formula.api import logit logistic_model = logit('target ~ mean_area',breast) result = logistic_model.fit() There is a…
7
votes
1 answer

Testing the Proportional Odds Assumption in R

I am working in R with a response variable that is the letter grade the student received in a specific course. The response is ordinal, and, in my opinion, seems logically proportional. My understanding is that I need to test that it is…
7
votes
2 answers

Different Sigmoid Equations and its implementation

When reviewing through the Sigmoid function that is used in Neural Nets, we found this equation from https://en.wikipedia.org/wiki/Softmax_function#Softmax_Normalization: Different from the standard sigmoid equation: The first equation on top…
alvas
  • 115,346
  • 109
  • 446
  • 738
7
votes
2 answers

Major assumptions of machine learning classifiers (LG, SVM, and decision trees)

In classical statistics, people usually state what assumptions are assumed (i.e. normality and linearity of data, independence of data). But when I am reading machine learning textbooks and tutorials, the underlying assumptions are not always…
KubiK888
  • 4,377
  • 14
  • 61
  • 115
7
votes
2 answers

How can I get the relative importance of features of a logistic regression for a particular prediction?

I am using a Logistic Regression (in scikit) for a binary classification problem, and am interested in being able to explain each individual prediction. To be more precise, I'm interested in predicting the probability of the positive class, and…
7
votes
0 answers

Incorporating random intercepts in R package rms for mixed effects logistic regression

Frank Harrell's R package rms is an amazing tool for implementing multiple logistic regression. However, I wish to know how/ if it is possible to incorporate random effects into a model run through rms. I know that rms can run through nlme, but only…
7
votes
1 answer

What's the relationship between an SVM and hinge loss?

My colleague and I are trying to wrap our heads around the difference between logistic regression and an SVM. Clearly they are optimizing different objective functions. Is an SVM as simple as saying it's a discriminative classifier that simply…
Simon
  • 2,840
  • 2
  • 18
  • 26
7
votes
3 answers

classification: PCA and logistic regression using sklearn

Step 0: Problem description I have a classification problem, ie I want to predict a binary target based on a collection of numerical features, using logistic regression, and after running a Principal Components Analysis (PCA). I have 2 datasets:…
ldocao
  • 129
  • 2
  • 12
7
votes
1 answer

Spark, MLlib: Adjusting classifier descrimination threshold

I try to use Spark MLlib Logistic Regression (LR) and/or Random Forests (RF) classifiers to create model to descriminate between two classes reprsented by sets which cardinality differes quite a lot. One set has 150 000 000 negative and and another…
zork
  • 2,085
  • 6
  • 32
  • 48
7
votes
4 answers

Error in glm() in R

I would like to perform a logistic regression but get errors - don't know where the mistake might be. The structure of my data: 'data.frame': 3911 obs. of 29 variables: $ vn1 : Factor w/ 2 levels "maennlich","weiblich": 1 1 2 1 1 2…
user5071089
7
votes
1 answer

splitting data into test and train, making a logistic regression model in pandas

I'm trying to run this code: (credit goes to Greg) import pandas as pd from sklearn.model_selection import train_test_split import statsmodels.api as sm quality = pd.read_csv("https://courses.edx.org/c4x/MITx/15.071x/asset/quality.csv") train, test…
alkamid
  • 6,970
  • 4
  • 28
  • 39
7
votes
1 answer

Vowpal Wabbit Logistic Regression

I am performing logistic regression using Vowpal Wabbit on a dataset with 25 features and 48 million instances. I have a question on current predict values. Should it be within 0 or 1. average since example example current …