Questions tagged [linear-regression]

for issues related to linear regression modelling approach

Linear Regression is a formalization of relationships between variables in the form of mathematical equations. It describes how one or more random variables are related to one or more other variables. Here the variables are not deterministically but stochastically related.

Example

Height and age are probabilistically distributed over humans. They are stochastically related; when you know that a person is of age 30, this influences the chance of this person being 4 feet tall. When you know that a person is of age 13, this influences the chance of this person being 6 feet tall.

Model 1

heighti = b0 + b1agei + εi, where b0 is the intercept, b1 is a parameter that age is multiplied by to get a prediction of height, ε is the error term, and i is the subject

Model 2

heighti = b0 + b1agei + b2sexi + εi, where the variable sex is dichotomous

In linear regression, user data X is modelled using linear functions Y, and unknown model parameters W are estimated or learned from the data. E.g., a linear regression model for a k-dimensional user data can be represented as :

Y = w1 x1 + w2 x2 + ... + wk xk

Reading Statistical Modeling: The Two Cultures http://projecteuclid.org/download/pdf_1/euclid.ss/1009213726

In scientific software for statistical computing and graphics, function lm (see ) implements linear regression.

6517 questions
27
votes
3 answers

How to get the P Value in a Variable from OLSResults in Python?

The OLSResults of df2 = pd.read_csv("MultipleRegression.csv") X = df2[['Distance', 'CarrierNum', 'Day', 'DayOfBooking']] Y = df2['Price'] X = add_constant(X) fit = sm.OLS(Y, X).fit() print(fit.summary()) shows the P values of each attribute to only…
Addzy K
  • 715
  • 1
  • 7
  • 11
26
votes
2 answers

geom_smooth in ggplot2 not working/showing up

I am trying to add a linear regression line to my graph, but when it's run, it's not showing up. The code below is simplified. There are usually multiple points on each day. The graph comes out fine other than that. …
E Phillips
  • 277
  • 1
  • 3
  • 8
25
votes
3 answers

Comparing Results from StandardScaler vs Normalizer in Linear Regression

I'm working through some examples of Linear Regression under different scenarios, comparing the results from using Normalizer and StandardScaler, and the results are puzzling. I'm using the boston housing dataset, and prepping it this way: import…
25
votes
3 answers

Efficient Cointegration Test in Python

I am wondering if there is a better way to test if two variables are cointegrated than the following method: import numpy as np import statsmodels.api as sm import statsmodels.tsa.stattools as ts y = np.random.normal(0,1, 250) x =…
Akavall
  • 82,592
  • 51
  • 207
  • 251
25
votes
4 answers

Can scipy.stats identify and mask obvious outliers?

With scipy.stats.linregress I am performing a simple linear regression on some sets of highly correlated x,y experimental data, and initially visually inspecting each x,y scatter plot for outliers. More generally (i.e. programmatically) is there a…
a different ben
  • 3,900
  • 6
  • 35
  • 45
24
votes
5 answers

How to obtain RMSE out of lm result?

I know there is a small difference between $sigma and the concept of root mean squared error. So, i am wondering what is the easiest way to obtain RMSE out of lm function in R? res<-lm(randomData$price ~randomData$carat+ …
Jeff
  • 7,767
  • 28
  • 85
  • 138
24
votes
3 answers

How to use `lmplot` to plot linear regression without intercept?

The lmplot in seaborn fit regression models with intercept. However, sometimes I want to fit regression models without intercept, i.e. regression through the origin. For example: In [1]: import numpy as np ...: import pandas as pd ...: import…
Eastsun
  • 18,526
  • 6
  • 57
  • 81
24
votes
1 answer

How can I force cv.glmnet not to drop one specific variable?

I am running a regression with 67 observasions and 32 variables. I am doing variable selection using cv.glmnet function from the glmnet package. There is one variable I want to force into the model. (It is dropped during normal procedure.) How can I…
lareven
  • 379
  • 2
  • 15
23
votes
4 answers

Annotate the linear regression equation

I tried fitting an OLS for Boston data set. My graph looks like below. How to annotate the linear regression equation just above the line or somewhere in the graph? How do I print the equation in Python? I am fairly new to this area. Exploring…
Naive_Natural2511
  • 687
  • 2
  • 8
  • 20
23
votes
2 answers

AttributeError: LinearRegression object has no attribute 'coef_'

I've been attempting to fit this data by a Linear Regression, following a tutorial on bigdataexaminer. Everything was working fine up until this point. I imported LinearRegression from sklearn, and printed the number of coefficients just fine. This…
23
votes
3 answers

Simple Linear Regression in Python

I am trying to implement this algorithm to find the intercept and slope for single variable: Here is my Python code to update the Intercept and slope. But it is not converging. RSS is Increasing with Iteration rather than decreasing and after some…
23
votes
5 answers

Extract Regression P Value in R

I am performing multiple regressions on different columns in a query file. I've been tasked with extracting certain results from the regression function lm in R. So far I have, > reg <- lm(query$y1 ~ query$x1 + query$x2) >…
Harmzy15
  • 487
  • 3
  • 6
  • 15
23
votes
5 answers

Constrained Linear Regression in Python

I have a classic linear regression problem of the form: y = X b where y is a response vector X is a matrix of input variables and b is the vector of fit parameters I am searching for. Python provides b = numpy.linalg.lstsq( X , y ) for solving…
ulmangt
  • 5,343
  • 3
  • 23
  • 36
22
votes
4 answers

Normal equation and Numpy 'least-squares', 'solve' methods difference in regression?

I am doing linear regression with multiple variables/features. I try to get thetas (coefficients) by using normal equation method (that uses matrix inverse), Numpy least-squares numpy.linalg.lstsq tool and np.linalg.solve tool. In my data I have n =…
21
votes
3 answers

Python pandas linear regression groupby

I am trying to use a linear regression on a group by pandas python dataframe: This is the dataframe df: group date value A 01-02-2016 16 A 01-03-2016 15 A 01-04-2016 14 A 01-05-2016 17…
jeangelj
  • 4,338
  • 16
  • 54
  • 98