Questions tagged [linear-regression]

for issues related to linear regression modelling approach

Linear Regression is a formalization of relationships between variables in the form of mathematical equations. It describes how one or more random variables are related to one or more other variables. Here the variables are not deterministically but stochastically related.

Example

Height and age are probabilistically distributed over humans. They are stochastically related; when you know that a person is of age 30, this influences the chance of this person being 4 feet tall. When you know that a person is of age 13, this influences the chance of this person being 6 feet tall.

Model 1

heighti = b0 + b1agei + εi, where b0 is the intercept, b1 is a parameter that age is multiplied by to get a prediction of height, ε is the error term, and i is the subject

Model 2

heighti = b0 + b1agei + b2sexi + εi, where the variable sex is dichotomous

In linear regression, user data X is modelled using linear functions Y, and unknown model parameters W are estimated or learned from the data. E.g., a linear regression model for a k-dimensional user data can be represented as :

Y = w1 x1 + w2 x2 + ... + wk xk

Reading Statistical Modeling: The Two Cultures http://projecteuclid.org/download/pdf_1/euclid.ss/1009213726

In scientific software for statistical computing and graphics, function lm (see ) implements linear regression.

6517 questions
36
votes
9 answers

How to find the features names of the coefficients using scikit linear regression?

I use scikit linear regression and if I change the order of the features, the coef are still printed in the same order, hence I would like to know the mapping of the feature with the coeff. #training the model model_1_features = ['sqft_living',…
amehta
  • 1,307
  • 3
  • 18
  • 22
36
votes
2 answers

How (and why) do you use contrasts?

Under what cases do you create contrasts in your analysis? How is it done and what is it used for? I checked ?contrasts and ?C - both lead to "Chapter 2 of Statistical Models in S", which is not readily available to me.
Tal Galili
  • 24,605
  • 44
  • 129
  • 187
35
votes
8 answers

ValueError: Expected 2D array, got 1D array instead:

While practicing Simple Linear Regression Model I got this error, I think there is something wrong with my data set. Here is my data set: Here is independent variable X: Here is dependent variable Y: Here is X_train Here Is Y_train This is error…
danyialKhan
  • 697
  • 2
  • 8
  • 12
35
votes
2 answers

Pandas rolling regression: alternatives to looping

I got good use out of pandas' MovingOLS class (source here) within the deprecated stats/ols module. Unfortunately, it was gutted completely with pandas 0.20. The question of how to run rolling OLS regression in an efficient manner has been asked…
Brad Solomon
  • 38,521
  • 31
  • 149
  • 235
33
votes
6 answers

python linear regression predict by date

I want to predict a value at a date in the future with simple linear regression, but I can't due to the date format. This is the dataframe I have: data_df = date value 2016-01-15 1555 2016-01-16 1678 2016-01-17 1789 ... y =…
jeangelj
  • 4,338
  • 16
  • 54
  • 98
32
votes
3 answers

How to add interaction term in Python sklearn

If I have independent variables [x1, x2, x3] If I fit linear regression in sklearn it will give me something like this: y = a*x1 + b*x2 + c*x3 + intercept Polynomial regression with poly =2 will give me something like y = a*x1^2 + b*x1*x2…
Dylan
  • 915
  • 3
  • 13
  • 20
32
votes
2 answers

How does predict.lm() compute confidence interval and prediction interval?

I ran a regression: CopierDataRegression <- lm(V1~V2, data=CopierData1) and my task was to obtain a 90% confidence interval for the mean response given V2=6 and 90% prediction interval when V2=6. I used the following code: X6 <-…
Mitty
  • 475
  • 1
  • 5
  • 9
31
votes
8 answers

Are there any Linear Regression Function in SQL Server?

Are there any Linear Regression Function in SQL Server 2005/2008, similar to the the Linear Regression functions in Oracle ?
rao
  • 1,024
  • 2
  • 11
  • 17
31
votes
3 answers

OLS Regression: Scikit vs. Statsmodels?

Short version: I was using the scikit LinearRegression on some data, but I'm used to p-values so put the data into the statsmodels OLS, and although the R^2 is about the same the variable coefficients are all different by large amounts. This…
Nat Poor
  • 451
  • 1
  • 6
  • 6
30
votes
2 answers

lme4::lmer reports "fixed-effect model matrix is rank deficient", do I need a fix and how to?

I am trying to run a mixed-effects model that predicts F2_difference with the rest of the columns as predictors, but I get an error message that says fixed-effect model matrix is rank deficient so dropping 7 columns / coefficients. From this…
Lisa
  • 909
  • 2
  • 13
  • 31
30
votes
8 answers

Scikit-Learn Linear Regression how to get coefficient's respective features?

I'm trying to perform feature selection by evaluating my regressions coefficient outputs, and select the features with the highest magnitude coefficients. The problem is, I don't know how to get the respective features, as only coefficients are…
jeffrey
  • 3,196
  • 7
  • 26
  • 44
29
votes
3 answers

Python scikit learn Linear Model Parameter Standard Error

I am working with sklearn and specifically the linear_model module. After fitting a simple linear as in import pandas as pd import numpy as np from sklearn import linear_model randn = np.random.randn X = pd.DataFrame(randn(10,3),…
Ryan
  • 655
  • 2
  • 8
  • 12
28
votes
2 answers

How to plot statsmodels linear regression (OLS) cleanly

Problem Statement: I have some nice data in a pandas dataframe. I'd like to run simple linear regression on it: Using statsmodels, I perform my regression. Now, how do I get my plot? I've tried statsmodels' plot_fit method, but the plot is a little…
Alex Lenail
  • 12,992
  • 10
  • 47
  • 79
28
votes
1 answer

Linear Regression and Gradient Descent in Scikit learn?

In this Coursera course for machine learning, it says gradient descent should converge. I'm using Linear regression from scikit learn. It doesn't provide gradient descent info. I have seen many questions on StackOverflow to implement linear…
Netro
  • 7,119
  • 6
  • 40
  • 58
27
votes
3 answers

Why is numpy.linalg.pinv() preferred over numpy.linalg.inv() for creating inverse of a matrix in linear regression

If we want to search for the optimal parameters theta for a linear regression model by using the normal equation with: theta = inv(X^T * X) * X^T * y one step is to calculate inv(X^T*X). Therefore numpy provides np.linalg.inv() and…
2Obe
  • 3,570
  • 6
  • 30
  • 54