Questions tagged [regression]

Regression analysis is a collection of statistical techniques for modeling and predicting one or multiple variables based on other data.

Wiki

Regression is a common applied statistical technique and a cornerstone of machine learning. Various algorithms and software packages can be used to fit and use regression models.

In other words, regression is a statistical measure that attempts to determine the strength of the relationship between one dependent variable (usually denoted by Y) and a series of other changing variables (known as independent variables). Typically the dependent variables are modeled with probability distributions whose parameters are assumed to vary (deterministically) with the independent variables.

Tag usage

Questions on should be about implementation and programming problems, not about the statistical or theoretical properties of the technique. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics and machine learning.

Read more:

9532 questions
13
votes
1 answer

How `poly()` generates orthogonal polynomials? How to understand the "coefs" returned?

My understanding of orthogonal polynomials is that they take the form y(x) = a1 + a2(x - c1) + a3(x - c2)(x - c3) + a4(x - c4)(x - c5)(x - c6)... up to the number of terms desired where a1, a2 etc are coefficients to each orthogonal term (vary…
pyg
  • 716
  • 6
  • 18
13
votes
2 answers

Why the built-in lm function is so slow in R?

I always thought that the lm function was extremely fast in R, but as this example would suggest, the closed solution computed using the solve function is way faster. data<-data.frame(y=rnorm(1000),x1=rnorm(1000),x2=rnorm(1000)) X =…
adaien
  • 1,932
  • 1
  • 12
  • 26
13
votes
2 answers

Regression analysis in MySQL

Introduction in my project I'm saving FacebookPages and their like count, as well as the like count per country. I have a table for the FacebookPages, one for the languages, one for the correlation between the facebook page and the language (and…
Musterknabe
  • 5,763
  • 14
  • 61
  • 117
13
votes
2 answers

Regression (logistic) in R: Finding x value (predictor) for a particular y value (outcome)

I've fitted a logistic regression model that predicts the a binary outcome vs from mpg (mtcars dataset). The plot is shown below. How can I determine the mpg value for any particular vs value? For example, I'm interested in finding out what the mpg…
hsl
  • 670
  • 2
  • 10
  • 22
13
votes
4 answers

large-scale regression in R with a sparse feature matrix

I'd like to do large-scale regression (linear/logistic) in R with many (e.g. 100k) features, where each example is relatively sparse in the feature space---e.g., ~1k non-zero features per example. It seems like the SparseM package slm should do…
jhofman
  • 568
  • 1
  • 7
  • 15
13
votes
4 answers

loess predict with new x values

I am attempting to understand how the predict.loess function is able to compute new predicted values (y_hat) at points x that do not exist in the original data. For example (this is a simple example and I realize loess is obviously not needed for…
Alex
  • 19,533
  • 37
  • 126
  • 195
12
votes
1 answer

Python: Fastest way to perform millions of simple linear regression with 1 exogenous variable only

I am performing component wise regression on a time series data. This is basically where instead of regressing y against x1, x2, ..., xN, we would regress y against x1 only, y against x2 only, ..., and take the regression that reduces the sum of…
Lim Kaizhuo
  • 714
  • 3
  • 7
  • 16
12
votes
9 answers

Finding coefficients for logistic regression

I'm working on a classification problem and need the coefficients of the logistic regression equation. I can find the coefficients in R but I need to submit the project in python. How to get the coefficient values in scikit-learn?
MonkeyDLuffy
  • 508
  • 1
  • 5
  • 24
12
votes
3 answers

how to get standardised (Beta) coefficients for multiple linear regression using statsmodels

when using the .summary() function using pandas statsmodels, the OLS Regression Results include the following fields. coef std err t P>|t| [0.025 0.975] How can I get the standardised coefficients (which exclude the…
Andreuccio
  • 1,053
  • 2
  • 18
  • 32
12
votes
1 answer

Regularization strategy in Keras

I have trying to setup a non-linear regression problem in Keras. Unfortunately, results show that overfitting is occurring. Here is the code, model = Sequential() model.add(Dense(number_of_neurons, input_dim=X_train.shape[1], activation='relu',…
trumee
  • 393
  • 1
  • 4
  • 11
12
votes
2 answers

Using LASSO in R with categorical variables

I've got a dataset with 1000 observations and 76 variables, about twenty of which are categorical. I want to use LASSO on this entire data set. I know that having factor variables doesn't really work in LASSO through either lars or glmnet, but the…
Alex
  • 121
  • 1
  • 1
  • 3
12
votes
2 answers

What does predict.glm(, type="terms") actually do?

I am confused with the way predict.glm function in R works. According to the help, The "terms" option returns a matrix giving the fitted values of each term in the model formula on the linear predictor scale. Thus, if my model has form f(y) =…
David Dale
  • 10,958
  • 44
  • 73
12
votes
1 answer

xgboost binary logistic regression

I am having problems running logistic regression with xgboost that can be summarized on the following example. Lets assume I have a very simple dataframe with two predictors and one target variable: df= pd.DataFrame({'X1' : pd.Series([1,0,0,1]),…
12
votes
2 answers

Difference between the interaction : and * term for formulas in StatsModels OLS regression

Hi I'm learning Statsmodel and can't figure out the difference between : and * (interaction terms) for formulas in StatsModels OLS regression. Could you please give me a hint to figure this out? Thank you! The…
user3368526
  • 2,168
  • 10
  • 37
  • 52
12
votes
4 answers

Test labels for regression caffe, float not allowed?

I am doing regression using caffe, and my test.txt and train.txt files are like this: /home/foo/caffe/data/finetune/flickr/3860781056.jpg 2.0 /home/foo/caffe/data/finetune/flickr/4559004485.jpg 3.6 …
Deven
  • 617
  • 2
  • 6
  • 20