Questions tagged [regression]

Regression analysis is a collection of statistical techniques for modeling and predicting one or multiple variables based on other data.

Wiki

Regression is a common applied statistical technique and a cornerstone of machine learning. Various algorithms and software packages can be used to fit and use regression models.

In other words, regression is a statistical measure that attempts to determine the strength of the relationship between one dependent variable (usually denoted by Y) and a series of other changing variables (known as independent variables). Typically the dependent variables are modeled with probability distributions whose parameters are assumed to vary (deterministically) with the independent variables.

Tag usage

Questions on should be about implementation and programming problems, not about the statistical or theoretical properties of the technique. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics and machine learning.

Read more:

9532 questions
24
votes
2 answers

Fixed effect in Pandas or Statsmodels

Is there an existing function to estimate fixed effect (one-way or two-way) from Pandas or Statsmodels. There used to be a function in Statsmodels but it seems discontinued. And in Pandas, there is something called plm, but I can't import it or run…
user3576212
  • 3,255
  • 9
  • 25
  • 33
24
votes
1 answer

How can I force cv.glmnet not to drop one specific variable?

I am running a regression with 67 observasions and 32 variables. I am doing variable selection using cv.glmnet function from the glmnet package. There is one variable I want to force into the model. (It is dropped during normal procedure.) How can I…
lareven
  • 379
  • 2
  • 15
24
votes
3 answers

Any Python Library Produces Publication Style Regression Tables

I've been using Python for regression analysis. After getting the regression results, I need to summarize all the results into one single table and convert them to LaTex (for publication). Is there any package that does this in Python? Something…
Titanic
  • 557
  • 1
  • 8
  • 21
23
votes
6 answers

Simple multidimensional curve fitting

I have a bunch of data, generally in the form a, b, c, ..., y where y = f(a, b, c...) Most of them are three and four variables, and have 10k - 10M records. My general assumption is that they are algebraic in nature, something like: y = P1 a^E1 +…
user64258
  • 231
  • 1
  • 2
  • 4
23
votes
3 answers

Getting glmnet coefficients at 'best' lambda

I am using following code with glmnet: > library(glmnet) > fit = glmnet(as.matrix(mtcars[-1]), mtcars[,1]) > plot(fit, xvar='lambda') However, I want to print out the coefficients at best Lambda, like it is done in ridge regression. I see…
rnso
  • 23,686
  • 25
  • 112
  • 234
23
votes
1 answer

How to export coefficients of the regression analysis fto a spreadsheet or csv file?

I am new to RStudio and I guess my question is pretty easy to solve but a lot of searching did not help me. I am running a regression and summary(regression1) shows me all the coefficients and so on. Now I am using coef(regression1) so it only…
OST_EE
  • 329
  • 1
  • 2
  • 5
23
votes
3 answers

sklearn LogisticRegression without regularization

Logistic regression class in sklearn comes with L1 and L2 regularization. How can I turn off regularization to get the "raw" logistic fit such as in glmfit in Matlab? I think I can set C = large number but I don't think it is wise. see for more…
Hanan Shteingart
  • 8,480
  • 10
  • 53
  • 66
23
votes
3 answers

R logistic regression area under curve

I am performing logistic regression using this page. My code is as below. mydata <- read.csv("http://www.ats.ucla.edu/stat/data/binary.csv") mylogit <- glm(admit ~ gre, data = mydata, family =…
user2543622
  • 5,760
  • 25
  • 91
  • 159
23
votes
5 answers

PCA first or normalization first?

When doing regression or classification, what is the correct (or better) way to preprocess the data? Normalize the data -> PCA -> training PCA -> normalize PCA output -> training Normalize the data -> PCA -> normalize PCA output -> training Which…
AlanS
  • 738
  • 1
  • 6
  • 13
22
votes
1 answer

Logistic regression - cbind command in glm

I am doing logistic regression in R. Can somebody clarify what is the differences of running these two lines? 1. glm(Response ~ Temperature, data=temp, family = binomial(link="logit")) 2. glm(cbind(Response, n - Response) ~…
Eddie
  • 783
  • 4
  • 12
  • 24
22
votes
1 answer

ggplot2: Logistic Regression - plot probabilities and regression line

I have a data.frame containing a continuous predictor and a dichotomous response variable. > head(df) position response 1 0 1 2 3 1 3 -4 0 4 -1 0 5 -2 1 6 0 0 I can…
vincentqu
  • 357
  • 1
  • 2
  • 6
22
votes
6 answers

Is it ok to define your own cost function for logistic regression?

In least-squares models, the cost function is defined as the square of the difference between the predicted value and the actual value as a function of the input. When we do logistic regression, we change the cost function to be a logarithmic…
London guy
  • 27,522
  • 44
  • 121
  • 179
22
votes
2 answers

Multivariate (polynomial) best fit curve in python?

How do you calculate a best fit line in python, and then plot it on a scatterplot in matplotlib? I was I calculate the linear best-fit line using Ordinary Least Squares Regression as follows: from sklearn import linear_model clf =…
Zach
  • 4,624
  • 13
  • 43
  • 60
21
votes
1 answer

Loss suddenly increases with Adam Optimizer in Tensorflow

I am using a CNN for a regression task. I use Tensorflow and the optimizer is Adam. The network seems to converge perfectly fine till one point where the loss suddenly increases along with the validation error. Here are the loss plots of the labels…
21
votes
2 answers

Specifying formula in R with glm without explicit declaration of each covariate

I would like to force specific variables into glm regressions without fully specifying each one. My real data set has ~200 variables. I haven't been able to find samples of this in my online searching thus far. For example (with just 3…
S.R.
  • 263
  • 1
  • 3
  • 5