Questions tagged [linear-regression]

for issues related to linear regression modelling approach

Linear Regression is a formalization of relationships between variables in the form of mathematical equations. It describes how one or more random variables are related to one or more other variables. Here the variables are not deterministically but stochastically related.

Example

Height and age are probabilistically distributed over humans. They are stochastically related; when you know that a person is of age 30, this influences the chance of this person being 4 feet tall. When you know that a person is of age 13, this influences the chance of this person being 6 feet tall.

Model 1

heighti = b0 + b1agei + εi, where b0 is the intercept, b1 is a parameter that age is multiplied by to get a prediction of height, ε is the error term, and i is the subject

Model 2

heighti = b0 + b1agei + b2sexi + εi, where the variable sex is dichotomous

In linear regression, user data X is modelled using linear functions Y, and unknown model parameters W are estimated or learned from the data. E.g., a linear regression model for a k-dimensional user data can be represented as :

Y = w1 x1 + w2 x2 + ... + wk xk

Reading Statistical Modeling: The Two Cultures http://projecteuclid.org/download/pdf_1/euclid.ss/1009213726

In scientific software for statistical computing and graphics, function lm (see ) implements linear regression.

6517 questions
16
votes
2 answers

Using scikit-learn (sklearn), how to handle missing data for linear regression?

I tried this but couldn't get it to work for my data: Use Scikit Learn to do linear regression on a time series pandas data frame My data consists of 2 DataFrames. DataFrame_1.shape = (40,5000) and DataFrame_2.shape = (40,74). I'm trying to do…
O.rka
  • 29,847
  • 68
  • 194
  • 309
16
votes
3 answers

How to get R-squared for robust regression (RLM) in Statsmodels?

When it comes to measuring goodness of fit - R-Squared seems to be a commonly understood (and accepted) measure for "simple" linear models. But in case of statsmodels (as well as other statistical software) RLM does not include R-squared together…
Primer
  • 10,092
  • 5
  • 43
  • 55
16
votes
6 answers

Efficient Multiple Linear Regression in C# / .Net

Does anyone know of an efficient way to do multiple linear regression in C#, where the number of simultaneous equations may be in the 1000's (with 3 or 4 different inputs). After reading this article on multiple linear regression I tried…
mike
  • 3,146
  • 5
  • 32
  • 46
16
votes
5 answers

How to Loop/Repeat a Linear Regression in R

I have figured out how to make a table in R with 4 variables, which I am using for multiple linear regressions. The dependent variable (Lung) for each regression is taken from one column of a csv table of 22,000 columns. One of the independent…
user4438232
16
votes
1 answer

Graphing perpendicular offsets in a least squares regression plot in R

I'm interested in making a plot with a least squares regression line and line segments connecting the datapoints to the regression line as illustrated here in the graphic called perpendicular…
D W
  • 2,979
  • 4
  • 34
  • 45
16
votes
5 answers

3D Linear Regression

I want to write a program that, given a list of points in 3D-space, represented as an array of x,y,z coordinates in floating point, outputs a best-fit line in this space. The line can/should be in the form of a unit vector and a point on the…
Jimmy
  • 435
  • 1
  • 5
  • 18
16
votes
1 answer

How to get the confidence intervals for LOWESS fit using R?

I didn't find any satisfactory answer to the confidence intervals (CIs) for LOWESS regression line of the 'stats' package of R: plot(cars, main = "lowess(cars)") lines(lowess(cars), col = 2) But I'm unsure how to draw a 95% CI around it?? However,…
ToNoY
  • 1,358
  • 2
  • 22
  • 43
15
votes
2 answers

matrices are not aligned Error: Python SciPy fmin_bfgs

Problem Synopsis: When attempting to use the scipy.optimize.fmin_bfgs minimization (optimization) function, the function throws a derphi0 = np.dot(gfk, pk) ValueError: matrices are not aligned error. According to my error checking this…
SaB
  • 747
  • 1
  • 9
  • 25
15
votes
2 answers

Access standardized residuals, cook's values, hatvalues (leverage) etc. easily in Python?

I am looking for influence statistics after fitting a linear regression. In R I can obtain them (e.g.) like this: hatvalues(fitted_model) #hatvalues (leverage) cooks.distance(fitted_model) #Cook's D values rstandard(fitted_model) #standardized…
Jaynes01
  • 521
  • 1
  • 5
  • 20
15
votes
1 answer

How do I get RSS from a linear model output

Below is a linear model output for a dataset consisting of a response variable and three explanatory variables. How do I get the RSS of the original regression? Call: lm(formula = y ~ x1 + x2 + x3) Residuals: Min 1Q Median 3Q …
wszsdmjj
  • 151
  • 1
  • 1
  • 4
15
votes
4 answers

Predicted vs. Actual plot

I'm new to R and statistics and haven't been able to figure out how one would go about plotting predicted values vs. Actual values after running a multiple linear regression. I have come across similar questions (just haven't been able to understand…
John
  • 387
  • 2
  • 3
  • 14
15
votes
2 answers

Predicting values using an OLS model with statsmodels

I calculated a model using OLS (multiple linear regression). I divided my data to train and test (half each), and then I would like to predict values for the 2nd half of the labels. model = OLS(labels[:half], data[:half]) predictions =…
nickb
  • 882
  • 3
  • 8
  • 22
14
votes
1 answer

support vector machines - a simple explanation?

So, i'm trying to understand how the SVM algorithm works but i just cannot figure out how you transform some datasets in points of n-dimensional plane that would have a mathematical meaning in order to separate the points through a hyperplane and…
flowerpower
  • 883
  • 9
  • 19
14
votes
1 answer

Get confidence interval from sklearn linear regression in python

I want to get a confidence interval of the result of a linear regression. I'm working with the boston house price dataset. I've found this question: How to calculate the 99% confidence interval for the slope in a linear regression model in…
Huondui
  • 383
  • 1
  • 3
  • 12
14
votes
3 answers

Increasing cost for linear regression

I implemented, for training purpose, a linear regression in python. The problem is that the cost is increasing instead of decreasing. For the data I use the Airfoil Self-Noise Data Set. Data can be found here I import data as follow : import pandas…
Vetouz
  • 159
  • 3
  • 19