Questions tagged [regression]

Regression analysis is a collection of statistical techniques for modeling and predicting one or multiple variables based on other data.

Wiki

Regression is a common applied statistical technique and a cornerstone of machine learning. Various algorithms and software packages can be used to fit and use regression models.

In other words, regression is a statistical measure that attempts to determine the strength of the relationship between one dependent variable (usually denoted by Y) and a series of other changing variables (known as independent variables). Typically the dependent variables are modeled with probability distributions whose parameters are assumed to vary (deterministically) with the independent variables.

Tag usage

Questions on should be about implementation and programming problems, not about the statistical or theoretical properties of the technique. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics and machine learning.

Read more:

9532 questions
21
votes
3 answers

logit regression and singular Matrix error in Python

am trying to run logit regression for german credit data (www4.stat.ncsu.edu/~boos/var.select/german.credit.html). To test the code, I have used only numerical variables and tried regressing it with the result using the following code. import pandas…
user3122731
  • 211
  • 1
  • 2
  • 4
21
votes
2 answers

Why is it inadvisable to get statistical summary information for regression coefficients from glmnet model?

I have a regression model with binary outcome. I fitted the model with glmnet and got the selected variables and their coefficients. Since glmnet doesn't calculate variable importance, I would like to feed the exact output (selected variables and…
TongZZZ
  • 756
  • 2
  • 8
  • 20
20
votes
2 answers

Newey-West standard errors with Mean Groups/Fama-MacBeth estimator

I'm trying to get Newey-West standard errors to work with the output of pmg() (Mean Groups/Fama-MacBeth estimator) from the plm package. Following the example from here: require(foreign) require(plm) require(lmtest) test <-…
cocquemas
  • 1,149
  • 8
  • 17
20
votes
3 answers

How do I print the variance of an lm in R without computing from the Standard Error by hand?

Simple question really! I am running lots of linear regressions of y~x and want to obtain the variance for each regression without computing it from hand from the Standard Error output given in the summary.lm command. Just to save a bit of time…
Sarah
  • 789
  • 3
  • 12
  • 29
20
votes
4 answers

Partial Least Squares Library

There was already a question like this, but it was not answered, so I try to post it again. Does anyone know of an open-source implementation of a partial least squares algorithm in C++ (or C)? Or maybe a library that does it?
ISTB
  • 1,799
  • 3
  • 22
  • 31
19
votes
1 answer

mgcv: How to set number and / or locations of knots for splines

I want to use function gam in mgcv packages: x <- seq(0,60, len =600) y <- seq(0,1, len=600) prova <- gam(y ~ s(x, bs='cr') can I set the number of knots in s()? and then can I know where are the knots that the spline used? Thanks!
memy
  • 229
  • 1
  • 2
  • 8
19
votes
1 answer

How to Use Lagged Time-Series Variables in a Python Pandas Regression Model?

I'm creating time-series econometric regression models. The data is stored in a Pandas data frame. How can I do lagged time-series econometric analysis using Python? I have used Eviews in the past (which is a standalone econometric program i.e. not…
Steve Maughan
  • 1,174
  • 3
  • 19
  • 30
19
votes
5 answers

LinAlgError: SVD did not converge in Linear Least Squares when trying polyfit

If I try to run the script below I get the error: LinAlgError: SVD did not converge in Linear Least Squares. I have used the exact same script on a similar dataset and there it works. I have tried to search for values in my dataset that Python might…
Toine Kerckhoffs
  • 293
  • 2
  • 4
  • 11
18
votes
1 answer

How to calculate the double integration in R

This is my r code to calculate beta values for each case which is pretty simple data =data.frame( "t" = seq(0, 1, 0.001) ) B3t <- function(t){ t**3 - 1.6*t**2 +0.76*t+1 } B2t <- function(t){ ifelse(t >= 0 & t < 0.342, …
Stupid_Intern
  • 3,382
  • 8
  • 37
  • 74
18
votes
4 answers

Keras - How to perform a prediction using KerasRegressor?

I am new to machine learning, and I am trying to handle Keras to perform regression tasks. I have implemented this code, based on this example. X =…
Simone
  • 4,800
  • 12
  • 30
  • 46
18
votes
2 answers

forward stepwise regression

In R stepwise forward regression, I specify a minimal model and a set of variables to add (or not to add): min.model = lm(y ~ 1) fwd.model = step(min.model, direction='forward', scope=(~ x1 + x2 + x3 + ...)) Is there any way to specify using all…
Michael Schubert
  • 2,726
  • 4
  • 27
  • 49
18
votes
2 answers

Ignoring missing values in multiple OLS regression with statsmodels

I'm trying to run a multiple OLS regression using statsmodels and a pandas dataframe. There are missing values in different columns for different rows, and I keep getting the error message: ValueError: array must not contain infs or NaNs I saw this…
user2649353
  • 367
  • 2
  • 3
  • 9
18
votes
1 answer

Extract only coefficients whose p values are significant from a logistic model

I have run a logistic regression, the summary of which I name. "score" Accordingly, summary(score) gives me the following Deviance Residuals: Min 1Q Median 3Q Max -1.3616 -0.9806 -0.7876 1.2563 1.9246 …
Jonathan Charlton
  • 1,975
  • 6
  • 23
  • 30
17
votes
1 answer

Neural Network Ordinal Classification for Age

I have created a simple neural network (Python, Theano) to estimate a persons age based on their spending history from a selection of different stores. Unfortunately, it is not particularly accurate. The accuracy might be hurt by the fact that the…
17
votes
1 answer

plot.lm(): extracting numbers labelled in the diagnostic Q-Q plot

For the simple example below, you can see that there are certain points that are identified in the ensuing plots. How can I extract the row numbers identified in these plots, especially the Normal Q-Q plot? set.seed(2016) maya <-…
Reuben Mathew
  • 598
  • 4
  • 22