Questions tagged [linear-regression]

for issues related to linear regression modelling approach

Linear Regression is a formalization of relationships between variables in the form of mathematical equations. It describes how one or more random variables are related to one or more other variables. Here the variables are not deterministically but stochastically related.

Example

Height and age are probabilistically distributed over humans. They are stochastically related; when you know that a person is of age 30, this influences the chance of this person being 4 feet tall. When you know that a person is of age 13, this influences the chance of this person being 6 feet tall.

Model 1

heighti = b0 + b1agei + εi, where b0 is the intercept, b1 is a parameter that age is multiplied by to get a prediction of height, ε is the error term, and i is the subject

Model 2

heighti = b0 + b1agei + b2sexi + εi, where the variable sex is dichotomous

In linear regression, user data X is modelled using linear functions Y, and unknown model parameters W are estimated or learned from the data. E.g., a linear regression model for a k-dimensional user data can be represented as :

Y = w1 x1 + w2 x2 + ... + wk xk

Reading Statistical Modeling: The Two Cultures http://projecteuclid.org/download/pdf_1/euclid.ss/1009213726

In scientific software for statistical computing and graphics, function lm (see ) implements linear regression.

6517 questions
2
votes
1 answer

Linear fit with Math.NET: error in data and error in fit parameters?

I am trying to use Math.NET to perform a simple linear fit through a small set of datapoints. Using Fit.Line I am very easily able to perform the linear fit and obtain the slope and intercept: Tuple result = Fit.Line(xdata,…
Nick Thissen
  • 1,802
  • 4
  • 27
  • 38
2
votes
1 answer

How to apply the results of linear regression on a training set of data to a testing set of data?

I have two non-empty dataframes: training and testing. Each of these dataframes has two columns: Y and X, in this order. I have applied linear regression analysis to training as follows: m <- lm(Y ~ X, data = training) I would like to apply the…
Evan Aad
  • 5,699
  • 6
  • 25
  • 36
2
votes
3 answers

Residuals from 1:1 line

I want to measure the distance between a set of points and a 1:1 line. I can build a linear model and get the residuals from the best fit, but I cant get the measure from a 1:1 line. Any helpful hints? #build a df of random numbers …
I Del Toro
  • 913
  • 4
  • 15
  • 36
2
votes
1 answer

Specifying a Constant in Statsmodels Linear Regression?

I want to use the statsmodels.regression.linear_model.OLS package to do a prediction, but with a specified constant. Currently, I can specify the presence of a constant with an argument: (from docs:…
Max Song
  • 1,607
  • 2
  • 18
  • 26
2
votes
2 answers

how to define graphical bounds of abline linear regression in R

I am trying to truncate the ends of an abline, which is actually just a linear regression of my data. fit1=lm(logy~logx) > fit1 Call: lm(formula = logy ~ logx) Coefficients: (Intercept) logx -5.339 -2.115 Where logx is…
James
  • 699
  • 8
  • 13
2
votes
1 answer

R - Extending Linear Model beyond scatterplot3d

I have created a scatterplot3d with a linear model applied. Unfortunately the results of the LM are subtle and need to be emphasised, my question is how can I extend the LM grid outside of the 'cube'. Plot: Code: Plot1 <-scatterplot3d( …
Methexis
  • 2,739
  • 5
  • 24
  • 34
2
votes
1 answer

Error when introducing dummy variables in a regression in Matlab

I am running some regressions in Matlab. My first three regressions are: tbl1=table(Y1,X1); mdl1=fitlm(tbl1,'Y1~X1'); mdl12=fitglm(tbl1,'Y1~X1','Distribution','binomial','link','probit'); mdl13=fitglm(tbl1,'Y1~X1','Distribution','binomial');…
2
votes
1 answer

Model Prediction for pooled regression model in panel data

I'm trying to produce a predictive model where i performed multiple pooled regressions in each year (based on previous years) and thus allow coefficients to vary across time. (This might not make sense in the sample data provided, but it is done in…
2
votes
1 answer

Regression coefficients and abline in R - linear regression

Thanks in advance for your attention. Here it's my problem: I have a dataframe, this is it's structure (I have deleted some rows): DATE CASES 02/01/2013 1 02/01/2013 2 03/01/2013 3 04/01/2013 4 04/01/2013 5 08/01/2013 …
Eka
  • 47
  • 1
  • 7
2
votes
2 answers

Performance issue in computing multiple linear regression with huge data sets

I am using np.linalg.lstsq for calculating the multiple linear regression. My data set is huge: has 20,000 independent variables(X) and 1 dependent variable (Y). Each independent variable has 10,000 datas. Something like this: X1 …
user2567857
  • 483
  • 7
  • 25
2
votes
1 answer

R - Unit specific time trends in regression

In a regression I am trying to model unit specific time trends but I keep running into difficulties. In R when I estimate the model with unit and year fixed effects like lm(y~x+factor(unit)+factor(time)) I get perfectly normal results. However when…
horseoftheyear
  • 917
  • 11
  • 23
2
votes
1 answer

How to speed up up Stochastic Gradient Descent?

I'm trying to fit a regression model with an L1 penalty, but I'm having trouble finding an implementation in python that fits in a reasonable amount of time. The data I've got is on the order of 100k by 500 (sidenote; several of the variables are…
choldgraf
  • 3,539
  • 4
  • 22
  • 27
2
votes
1 answer

Line fit from an array of 2d vectors

I have a problem in some C code, I assume it belonged here over the Mathematics exchange. I have an array of changes in x and y position generated by a user dragging a mouse, how could I determine if a straight line was drawn or not. I am currently…
Morgoth
  • 4,935
  • 8
  • 40
  • 66
2
votes
1 answer

How to calculate the 'Coefficient of determination' for a linear model in R?

I have the following set of x and y values: x = c(1:150) y = x^-.5 * 155 + (runif(length(x), min=-3, max=3)) And run a linear regression on the data: plot(x, y, log="xy", cex=.5) model = lm(log(y) ~ log(x)) model Now I'd like to have a measure…
R_User
  • 10,682
  • 25
  • 79
  • 120
2
votes
3 answers

How to plot CCDF graph on a logarithmic scale?

I want to plot a CCDF graph for some of my simulated power-law tail data on a log-log axis, below is my R code of plotting a CCDF graph on a normal axis, I used the code on the link: (How to plot a CCDF gragh?) > load("fakedata500.Rda") >…
user3579282
  • 45
  • 2
  • 9