Questions tagged [linear-regression]

for issues related to linear regression modelling approach

Linear Regression is a formalization of relationships between variables in the form of mathematical equations. It describes how one or more random variables are related to one or more other variables. Here the variables are not deterministically but stochastically related.

Example

Height and age are probabilistically distributed over humans. They are stochastically related; when you know that a person is of age 30, this influences the chance of this person being 4 feet tall. When you know that a person is of age 13, this influences the chance of this person being 6 feet tall.

Model 1

heighti = b0 + b1agei + εi, where b0 is the intercept, b1 is a parameter that age is multiplied by to get a prediction of height, ε is the error term, and i is the subject

Model 2

heighti = b0 + b1agei + b2sexi + εi, where the variable sex is dichotomous

In linear regression, user data X is modelled using linear functions Y, and unknown model parameters W are estimated or learned from the data. E.g., a linear regression model for a k-dimensional user data can be represented as :

Y = w1 x1 + w2 x2 + ... + wk xk

Reading Statistical Modeling: The Two Cultures http://projecteuclid.org/download/pdf_1/euclid.ss/1009213726

In scientific software for statistical computing and graphics, function lm (see ) implements linear regression.

6517 questions
58
votes
10 answers

Cost Function, Linear Regression, trying to avoid hard coding theta. Octave.

I'm in the second week of Professor Andrew Ng's Machine Learning course through Coursera. We're working on linear regression and right now I'm dealing with coding the cost function. The code I've written solves the problem correctly but does not…
OhNoNotScott
  • 824
  • 2
  • 9
  • 12
54
votes
2 answers

How to make seaborn regplot partially see through (alpha)

When using seaborn barplot, I can specify an alpha to make the bars semi-translucent. However, when I try this with seaborn regplot, I get an error saying this is an unexpected argument. I read the documentation online but didn't find much. Could…
qwertylpc
  • 2,016
  • 7
  • 24
  • 34
54
votes
6 answers

Why do I get only one parameter from a statsmodels OLS fit

Here is what I am doing: $ python Python 2.7.6 (v2.7.6:3a1db0d2747e, Nov 10 2013, 00:42:54) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin >>> import statsmodels.api as sm >>> statsmodels.__version__ '0.5.0' >>> import numpy >>> y =…
Tom
  • 2,769
  • 2
  • 17
  • 22
54
votes
1 answer

Is there a better alternative than string manipulation to programmatically build formulas?

Everyone else's functions seem to take formula objects and then do dark magic to them somewhere deep inside and I'm jealous. I'm writing a function that fits multiple models. Parts of the formulas for these models remain the same and part change…
bokov
  • 3,444
  • 2
  • 31
  • 49
51
votes
6 answers

TensorFlow: "Attempting to use uninitialized value" in variable initialization

I am trying to implement multivariate linear regression in Python using TensorFlow, but have run into some logical and implementation issues. My code throws the following error: Attempting to use uninitialized value Variable Caused by op…
NEW USER
  • 797
  • 2
  • 7
  • 11
47
votes
3 answers

predict.lm() in a loop. warning: prediction from a rank-deficient fit may be misleading

This R code throws a warning # Fit regression model to each cluster y <- list() length(y) <- k vars <- list() length(vars) <- k f <- list() length(f) <- k for (i in 1:k) { vars[[i]] <- names(corc[[i]][corc[[i]]!= "1"]) f[[i]] <-…
Mahsa
  • 531
  • 1
  • 5
  • 9
45
votes
3 answers

Linear Regression with a known fixed intercept in R

I want to calculate a linear regression using the lm() function in R. Additionally I want to get the slope of a regression, where I explicitly give the intercept to lm(). I found an example on the internet and I tried to read the R-help "?lm"…
R_User
  • 10,682
  • 25
  • 79
  • 120
44
votes
8 answers

Error in Confusion Matrix : the data and reference factors must have the same number of levels

I've trained a Linear Regression model with R caret. I'm now trying to generate a confusion matrix and keep getting the following error: Error in confusionMatrix.default(pred, testing$Final) : the data and reference factors must have the same…
43
votes
5 answers

How to extract the regression coefficient from statsmodels.api?

result = sm.OLS(gold_lookback, silver_lookback ).fit() After I get the result, how can I get the coefficient and the constant? In other words, if y = ax + c how to get the values a and c?
JOHN
  • 1,411
  • 3
  • 21
  • 41
41
votes
5 answers

Linear Regression :: Normalization (Vs) Standardization

I am using Linear regression to predict data. But, I am getting totally contrasting results when I Normalize (Vs) Standardize variables. Normalization = x -xmin/ xmax – xmin   Zero Score Standardization = x - xmean/ xstd   a) Also,…
39
votes
3 answers

How to force zero interception in linear regression?

I have some more or less linear data of the form: x = [0.1, 0.2, 0.4, 0.6, 0.8, 1.0, 2.0, 4.0, 6.0, 8.0, 10.0, 20.0, 40.0, 60.0, 80.0] y = [0.50505332505407008, 1.1207373784533172, 2.1981844719020001, 3.1746209003398689, 4.2905482471260044,…
Kyra Tafar
  • 393
  • 1
  • 3
  • 4
39
votes
7 answers

predict.lm() with an unknown factor level in test data

I am fitting a model to factor data and predicting. If the newdata in predict.lm() contains a single factor level that is unknown to the model, all of predict.lm() fails and returns an error. Is there a good way to have predict.lm() return a…
Stephan Kolassa
  • 7,953
  • 2
  • 28
  • 48
38
votes
7 answers

Linear Regression in Javascript

I want to do Least Squares Fitting in Javascript in a web browser. Currently users enter data point information using HTML text inputs and then I grab that data with jQuery and graph it with Flot. After the user had entered in their data points I…
Chris W.
  • 37,583
  • 36
  • 99
  • 136
38
votes
1 answer

In the LinearRegression method in sklearn, what exactly is the fit_intercept parameter doing?

In the sklearn.linear_model.LinearRegression method, there is a parameter that is fit_intercept = TRUE or fit_intercept = FALSE. I am wondering if we set it to TRUE, does it add an additional intercept column of all 1's to your dataset? If I already…
user321627
  • 2,350
  • 4
  • 20
  • 43
37
votes
5 answers

Linear Regression on Pandas DataFrame using Sklearn ( IndexError: tuple index out of range)

I'm new to Python and trying to perform linear regression using sklearn on a pandas dataframe. This is what I did: data = pd.read_csv('xxxx.csv') After that I got a DataFrame of two columns, let's call them 'c1', 'c2'. Now I want to do linear…
Dinosaur
  • 645
  • 4
  • 10
  • 14