Questions tagged [linear-regression]

for issues related to linear regression modelling approach

Linear Regression is a formalization of relationships between variables in the form of mathematical equations. It describes how one or more random variables are related to one or more other variables. Here the variables are not deterministically but stochastically related.

Example

Height and age are probabilistically distributed over humans. They are stochastically related; when you know that a person is of age 30, this influences the chance of this person being 4 feet tall. When you know that a person is of age 13, this influences the chance of this person being 6 feet tall.

Model 1

heighti = b0 + b1agei + εi, where b0 is the intercept, b1 is a parameter that age is multiplied by to get a prediction of height, ε is the error term, and i is the subject

Model 2

heighti = b0 + b1agei + b2sexi + εi, where the variable sex is dichotomous

In linear regression, user data X is modelled using linear functions Y, and unknown model parameters W are estimated or learned from the data. E.g., a linear regression model for a k-dimensional user data can be represented as :

Y = w1 x1 + w2 x2 + ... + wk xk

Reading Statistical Modeling: The Two Cultures http://projecteuclid.org/download/pdf_1/euclid.ss/1009213726

In scientific software r for statistical computing and graphics, function lm (see lm) implements linear regression.

6517 questions

votes

2 answers

Vector autoregressive model fitting with scikit-learn

I am trying to fit vector autoregressive (VAR) models using the generalized linear model fitting methods included in scikit-learn. The linear model has the form y = X w, but the system matrix X has a very peculiar structure: it is block-diagonal,…

python machine-learning scikit-learn linear-regression model-fitting

asked Dec 19 '13 at 12:02

MB-F

22,770
4
61
116

votes

4 answers

segmented linear regression in python

Is there a library in python to do segmented linear regression? I'd like to fit multiple lines to my data automatically to get something like this: Btw. I do know the number of segments.

python linear-regression

asked Jan 25 '12 at 07:55

P3trus

6,747
8
40
54

votes

3 answers

drop_First=true during dummy variable creation in pandas

I have months(Jan, Feb, Mar etc) data in my dataset and I am generating dummy variable using pandas library. pd.get_dummies(df['month'],drop_first=True) I want to understand whether I should use drop_first=True or not in this case? Why is it…

python linear-regression

asked Aug 30 '20 at 19:17

Snehal Gupta

votes

1 answer

Python: Fastest way to perform millions of simple linear regression with 1 exogenous variable only

I am performing component wise regression on a time series data. This is basically where instead of regressing y against x1, x2, ..., xN, we would regress y against x1 only, y against x2 only, ..., and take the regression that reduces the sum of…

python numpy regression linear-regression statsmodels

asked Jun 26 '20 at 09:43

Lim Kaizhuo

votes

1 answer

How to remove RunTimeWarning Errors from code?

I keep getting RuntimeWarning when I run the regression code at the very bottom. I am not sure how to fix them. I believe it may be the attencoef list because there is some nan values in it. Any suggestions? These are the errors I am…

python numpy linear-regression compiler-warnings spyder

asked Jul 05 '17 at 18:28

Adam

votes

2 answers

statsmodels add_constant for OLS intercept, what is this actually doing?

Reviewing linear regressions via statsmodels OLS fit I see you have to use add_constant to add a constant '1' to all your points in the independent variable(s) before fitting. However my only understanding of intercepts in this context would be the…

python linear-regression statsmodels

asked Dec 31 '16 at 02:08

Tim Lindsey

votes

1 answer

Multivariate Regression Neural Network Loss Function

I am doing multivariate regression with a fully connected multilayer neural network in Tensorflow. The network predicts 2 continuous float variables (y1,y2) given an input vector (x1,x2,...xN), i.e. the network has 2 output nodes. With 2 outputs the…

neural-network tensorflow linear-regression

asked Jul 17 '16 at 22:41

Ron Cohen

2,815
5
30
45

votes

2 answers

Print OLS regression summary to text file

I am running OLS regression using pandas.stats.api.ols using a groupby with the following code: from pandas.stats.api import ols df=pd.read_csv(r'F:\file.csv') result=df.groupby(['FID']).apply(lambda d: ols(y=d.loc[:, 'MEAN'], x=d.loc[:,…

python csv pandas linear-regression statsmodels

asked Apr 01 '16 at 16:07

Stefano Potter

3,467
10
45
82

votes

2 answers

Optimal two variable linear regression calculation

Problem Am looking to apply the y = mx + b equation (where m is SLOPE, b is INTERCEPT) to a data set, which is retrieved as shown in the SQL code. The values from the (MySQL) query are: SLOPE = 0.0276653965651912 INTERCEPT = -57.2338357550468 SQL…

mysql sql statistics linear-regression

asked May 09 '10 at 20:23

Dave Jarvis

30,436
41
178
315

votes

2 answers

Create lm object from data/coefficients

Does anyone know of a function that can create an lm object given a dataset and coefficients? I'm interested in this because I started playing with Bayesian model averaging (BMA) and I'd like to be able to create an lm object out of the results of…

r linear-regression

asked Jan 14 '10 at 04:37

Bob Albright

2,242
2
25
32

votes

3 answers

scikit-learn & statsmodels - which R-squared is correct?

I'd like to choose the best algorithm for future. I found some solutions, but I didn't understand which R-Squared value is correct. For this, I divided my data into two as test and training, and I printed two different R squared values…

python machine-learning scikit-learn linear-regression statsmodels

asked Feb 10 '19 at 07:04

Mert Yanık

votes

3 answers

plot regression line in R

I want to plot a simple regression line in R. I've entered the data, but the regression line doesn't seem to be right. Can someone help? x <- c(10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120) y <- c(10, 18, 25, 29, 30, 28, 25, 22, 18, 15, 11,…

r plot regression linear-regression lm

asked Sep 28 '16 at 01:56

J.doe

votes

1 answer

how to create DataFrame from multiple arrays in Spark Scala?

val tvalues: Array[Double] = Array(1.866393526974307, 2.864048126935307, 4.032486069215076, 7.876169953355888, 4.875333799256043, 14.316322626848278) val pvalues: Array[Double] = Array(0.064020056478447, 0.004808399479386827, 8.914865448939047E-5,…

arrays scala linear-regression apache-spark-sql

asked May 11 '16 at 05:03

Sam

1,227
3
11
13

votes

3 answers

Linear Regression with positive coefficients in Python

I'm trying to find a way to fit a linear regression model with positive coefficients. The only way I found is sklearn's Lasso model, which has a positive=True argument, but doesn't recommend using with alpha=0 (means no other constraints on the…

python machine-learning scikit-learn linear-regression

asked Mar 14 '16 at 11:43

Oren

votes

1 answer

Multi Collinearity for Categorical Variables

For Numerical/Continuous data, to detect Collinearity between predictor variables we use the Pearson's Correlation Coefficient and make sure that predictors are not correlated among themselves but are correlated with the response variable. But How…

r statistics linear-regression

asked Oct 28 '15 at 17:29

karthik subramanian

Prev 1 2 3

…

99 100 Next