Questions tagged [linear-regression]

for issues related to linear regression modelling approach

Linear Regression is a formalization of relationships between variables in the form of mathematical equations. It describes how one or more random variables are related to one or more other variables. Here the variables are not deterministically but stochastically related.

Example

Height and age are probabilistically distributed over humans. They are stochastically related; when you know that a person is of age 30, this influences the chance of this person being 4 feet tall. When you know that a person is of age 13, this influences the chance of this person being 6 feet tall.

Model 1

heighti = b0 + b1agei + εi, where b0 is the intercept, b1 is a parameter that age is multiplied by to get a prediction of height, ε is the error term, and i is the subject

Model 2

heighti = b0 + b1agei + b2sexi + εi, where the variable sex is dichotomous

In linear regression, user data X is modelled using linear functions Y, and unknown model parameters W are estimated or learned from the data. E.g., a linear regression model for a k-dimensional user data can be represented as :

Y = w1 x1 + w2 x2 + ... + wk xk

Reading Statistical Modeling: The Two Cultures http://projecteuclid.org/download/pdf_1/euclid.ss/1009213726

In scientific software r for statistical computing and graphics, function lm (see lm) implements linear regression.

6517 questions

votes

2 answers

Use "colon" between two characters as a regressor in lm()

What does it mean when we put a colon : between two characters? I'm sure it's not saying from character A to character B. Here is the code: fit9=lm(Sales~.+Income:Advertising+Price:Age,data=Carseats) Coefficients: Estimate …

r linear-regression

asked Mar 24 '17 at 11:16

Sheryl

votes

3 answers

How to use formula in R to exclude main effect but retain interaction

I do not want main effect because it is collinear with a finer factor fixed effect, so it is annoying to have these NA. In this example: lm(y ~ x * z) I want the interaction of x (numeric) and z (factor), but not the main effect of z.

r regression linear-regression lm categorical-data

asked Nov 21 '16 at 21:26

wolfsatthedoor

7,163
18
46
90

votes

3 answers

Can we use Normal Equation for Logistic Regression ?

Just like we use the Normal Equation to find out the optimum theta value in Linear Regression, can/can't we use a similar formula for Logistic Regression ? If not, why ? I'd be grateful if could someone could explain the reasoning behind it. Thank…

machine-learning linear-regression logistic-regression

asked Jun 23 '16 at 16:36

user2125722

1,289
3
18
29

votes

3 answers

OLS using statsmodel.formula.api versus statsmodel.api

Can anyone explain to me the difference between ols in statsmodel.formula.api versus ols in statsmodel.api? Using the Advertising data from the ISLR text, I ran an ols using both, and got different results. I then compared with scikit-learn's…

python linear-regression

asked Jun 04 '15 at 17:20

Chetan Prabhu

votes

1 answer

Converting Numpy Lstsq residual value to R^2

I am performing a least squares regression as below (univariate). I would like to express the significance of the result in terms of R^2. Numpy returns a value of unscaled residual, what would be a sensible way of normalizing…

python numpy linear-regression

asked Jun 16 '10 at 14:27

whatnick

5,400
3
19
35

votes

1 answer

sklearn LinearRegression, why only one coefficient returned by the model?

I'm trying out scikit-learn LinearRegression model on a simple dataset (comes from Andrew NG coursera course, I doesn't really matter, look the plot for reference) this is my script import numpy as np import matplotlib.pyplot as plt from…

machine-learning scikit-learn linear-regression

asked Apr 17 '15 at 15:10

JackNova

3,911
5
31
49

votes

4 answers

R-squared on test data

I fit a linear regression model on 75% of my data set that includes ~11000 observations and 143 variables: gl.fit <- lm(y[1:ceiling(length(y)*(3/4))] ~ ., data= x[1:ceiling(length(y)*(3/4)),]) #3/4 for training and I got an R^2 of 0.43. I then…

r linear-regression

asked Sep 05 '14 at 17:33

H_A

votes

1 answer

"weighted" regression in R

I have created a script like the one below to do something I called as "weighted" regression: library(plyr) set.seed(100) temp.df <- data.frame(uid=1:200, bp=sample(x=c(100:200),size=200,replace=TRUE), …

r linear-regression weighted

asked Apr 22 '12 at 14:16

lokheart

23,743
39
98
169

votes

1 answer

Fast pairwise simple linear regression between variables in a data frame

I have seen pairwise or general paired simple linear regression many times on Stack Overflow. Here is a toy dataset for this kind of problem. set.seed(0) X <- matrix(runif(100), 100, 5, dimnames = list(1:100, LETTERS[1:5])) b <- c(1, 0.7, 1.3, 2.9,…

r performance regression linear-regression lm

asked Aug 21 '18 at 17:13

Zheyuan Li

71,365
17
180
248

votes

2 answers

Multiple Linear Regression with specific constraint on each coefficients on Python

I am currently running multiple linear regression on a dataset. At first, I didn't realize I needed to put constraints over my weights; as a matter of fact, I need to have specific positive & negative weights. To be more precise, I am doing a…

python machine-learning scikit-learn constraints linear-regression

asked May 18 '18 at 11:10

Benjamin Salem

votes

1 answer

Why `sklearn` and `statsmodels` implementation of OLS regression give different R^2?

Accidentally I have noticed, that OLS models implemented by sklearn and statsmodels yield different values of R^2 when not fitting intercept. Otherwise they seems to work fine. The following code yields: import numpy as np import sklearn import…

python python-3.x scikit-learn linear-regression statsmodels

asked Feb 16 '18 at 18:35

abukaj

2,582
1
22
45

votes

1 answer

How `poly()` generates orthogonal polynomials? How to understand the "coefs" returned?

My understanding of orthogonal polynomials is that they take the form y(x) = a1 + a2(x - c1) + a3(x - c2)(x - c3) + a4(x - c4)(x - c5)(x - c6)... up to the number of terms desired where a1, a2 etc are coefficients to each orthogonal term (vary…

r matrix regression linear-regression lm

asked Aug 19 '16 at 04:33

pyg

votes

2 answers

Why the built-in lm function is so slow in R?

I always thought that the lm function was extremely fast in R, but as this example would suggest, the closed solution computed using the solve function is way faster. data<-data.frame(y=rnorm(1000),x1=rnorm(1000),x2=rnorm(1000)) X =…

r regression linear-regression lm

asked Apr 12 '16 at 11:24

adaien

1,932
1
12
26

votes

2 answers

Regression (logistic) in R: Finding x value (predictor) for a particular y value (outcome)

I've fitted a logistic regression model that predicts the a binary outcome vs from mpg (mtcars dataset). The plot is shown below. How can I determine the mpg value for any particular vs value? For example, I'm interested in finding out what the mpg…

r regression linear-regression logistic-regression predict

asked Aug 16 '15 at 22:19

hsl

votes

4 answers

How to do linear regression, taking errorbars into account?

I am doing a computer simulation for some physical system of finite size, and after this I am doing extrapolation to the infinity (Thermodynamic limit). Some theory says that data should scale linearly with system size, so I am doing linear…

python numpy linear-regression least-squares extrapolation

asked Jan 30 '14 at 23:26

Vladimir

Prev 1 2 3

…

99 100 Next