Questions tagged [logistic-regression]

Logistic regression is a statistical classification model used for making categorical predictions.

Logistic regression is a statistical analysis method used for predicting and understanding categorical dependent variables (e.g., true/false, or multinomial outcomes) based on one or more independent variables (e.g., predictors, features, or attributes). The probabilities describing the possible outcomes of a single trial are modeled as a function of the predictors using a logistic function (as it follows):

enter image description here

A logistic regression model can be represented by:

enter image description here

The logistic regression model has the nice property that the exponentiated regression coefficients can be interpreted as odds ratios associated with a one unit increase in the predictor.

Multinomial logistic regression (i.e., with three or more possible outcomes) are also sometimes called Maximum Entropy (MaxEnt) classifiers in the machine learning literature.


Tag usage

Questions on should be about implementation and programming problems, not about the statistical or theoretical properties of the technique. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics, machine learning and data analysis.

3746 questions
6
votes
2 answers

Difference between glm and LogitModelFit

I have a problem with glm function in R. Specifically, I am not sure how to include nominal variables. The results that I get in R after running the glm function are the following: > df x1 x2 y 1 a 2 0 2 b 4 1 3 a 4 0 4 b 2 1 5 a 4…
6
votes
1 answer

Logistic Regression Gradient Descent

I have to do Logistic regression using batch gradient descent. import numpy as np X = np.asarray([ [0.50],[0.75],[1.00],[1.25],[1.50],[1.75],[1.75], [2.00],[2.25],[2.50],[2.75],[3.00],[3.25],[3.50], [4.00],[4.25],[4.50],[4.75],[5.00],[5.50]]) y =…
6
votes
1 answer

Spark: Extracting summary for a ML logistic regression model from a pipeline model

I've estimated a logistic regression using pipelines. My last few lines before fitting the logistic regression: from pyspark.ml.feature import VectorAssembler from pyspark.ml.classification import LogisticRegression lr =…
user3245256
  • 1,842
  • 4
  • 24
  • 51
6
votes
3 answers

How to do regression as opposed to classification using logistic regression and scikit learn

The target variable that I need to predict are probabilities (as opposed to labels). The corresponding column in my training data are also in this form. I do not want to lose information by thresholding the targets to create a classification problem…
san
  • 4,144
  • 6
  • 32
  • 50
6
votes
2 answers

numpy TypeError: ufunc 'invert' not supported for the input types, and the inputs

For the code below: def makePrediction(mytheta, myx): # ----------------------------------------------------------------- pr = sigmoid(np.dot(myx, mytheta)) pr[pr < 0.5] =0 pr[pr >= 0.5] = 1 return pr #…
Mona Jalal
  • 34,860
  • 64
  • 239
  • 408
6
votes
2 answers

Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels): factor X has new levels

I did a logistic regression: EW <- glm(everwrk~age_p + r_maritl, data = NH11, family = "binomial") Moreover, I want to predict everwrk for each level of r_maritl. r_maritl has the following levels: levels(NH11$r_maritl) "0 Under 14 years" "1…
Hadsga
  • 185
  • 2
  • 4
  • 15
6
votes
1 answer

what is raw prediction in Logistic Regression in spark mllib?

I have run binary logistic regression using spark mllib. As per documentation of spark mllib, RawPrediction are confidence values, which i assume probability for lcl and ucl. I am getting -ve values for RawPrediction. In what scenarios, raw…
6
votes
3 answers

Reproducing LASSO / Logistic Regression results in R with Python using the Iris Dataset

I'm trying to reproduce the following R results in Python. In this particular case the R predictive skill is lower than the Python skill, but this is usually not the case in my experience (hence the reason for wanting to reproduce the results in…
6
votes
1 answer

Logistic regression: plotting decision boundary from theta

I have the following code: x1 = np.random.randn(100) y1 = np.random.randn(100) + 3 x2 = np.random.randn(100) + 3 y2 = np.random.randn(100) plt.plot(x1, y1, "+", x2, y2, "x") plt.axis('equal') plt.show() which results in the following image I have…
user5368737
  • 793
  • 3
  • 12
  • 20
6
votes
1 answer

Can I extract significane values for Logistic Regression coefficients in pyspark

Is there a way to get the significance level of each coefficient we receive after we fit a logistic regression model on training data? I was trying to find out a way and could not figure out myself. I think I may get the significance level of each…
6
votes
1 answer

ValueError: This solver needs samples of at least 2 classes in the data, but the data contains only one class: 1.0

I have a training dataset of 8670 trials and each trial has a length of 125-time samples while my test set consists of 578 trials. When I apply SVM algorithm from scikit-learn, I get pretty good results. However, when I apply logistic regression,…
user5499279
6
votes
1 answer

Calculating standard error of estimate, Wald-Chi Square statistic, p-value with logistic regression in Spark

I was trying to build Logistic regression model on a sample data. The output from the model we can get are the weights of features used to build the model. I could not find Spark API for standard error of estimate, Wald-Chi Square statistic, p-value…
6
votes
3 answers

Plotting a multiple logistic regression for binary and continuous values in R

I have a data frame of mammal genera. Each row of the column is a different genus. There are three columns: a column of each genus's geographic range size (a continuous variable), a column stating whether or not a genus is found inside or outside of…
Sharon McMullen
  • 63
  • 1
  • 1
  • 5
6
votes
3 answers

How to evaluate cost function for scikit learn LogisticRegression?

After using sklearn.linear_model.LogisticRegression to fit a training data set, I would like to obtain the value of the cost function for the training data set and a cross validation data set. Is it possible to have sklearn simply give me the value…
Corey
  • 1,845
  • 1
  • 12
  • 23
6
votes
2 answers

Speeding up matrix-vector multiplication and exponentiation in Python, possibly by calling C/C++

I am currently working on a machine learning project where - given a data matrix Z and a vector rho - I have to compute the value and slope of the logistic loss function at rho. The computation involves basic matrix-vector multiplication and log/exp…
Berk U.
  • 7,018
  • 6
  • 44
  • 69