Questions tagged [linear-regression]

for issues related to linear regression modelling approach

Linear Regression is a formalization of relationships between variables in the form of mathematical equations. It describes how one or more random variables are related to one or more other variables. Here the variables are not deterministically but stochastically related.

Example

Height and age are probabilistically distributed over humans. They are stochastically related; when you know that a person is of age 30, this influences the chance of this person being 4 feet tall. When you know that a person is of age 13, this influences the chance of this person being 6 feet tall.

Model 1

heighti = b0 + b1agei + εi, where b0 is the intercept, b1 is a parameter that age is multiplied by to get a prediction of height, ε is the error term, and i is the subject

Model 2

heighti = b0 + b1agei + b2sexi + εi, where the variable sex is dichotomous

In linear regression, user data X is modelled using linear functions Y, and unknown model parameters W are estimated or learned from the data. E.g., a linear regression model for a k-dimensional user data can be represented as :

Y = w1 x1 + w2 x2 + ... + wk xk

Reading Statistical Modeling: The Two Cultures http://projecteuclid.org/download/pdf_1/euclid.ss/1009213726

In scientific software for statistical computing and graphics, function lm (see ) implements linear regression.

6517 questions
2
votes
1 answer

looping regressions on unblanced data set in R (using apply functions)

I have a dataset of 100 different countries and for each country five variables. For each country, I want to do a linear regression and store the results afterwards. The main problem is, for some countries I have no data for some variables. My data…
Daniel Ryback
  • 398
  • 6
  • 12
2
votes
2 answers

Align dates in R date.table for linear regression

I am having a data.table with returns on n dates for m securities. I would like to do a multiple linear regression in the form of lm(ReturnSec1 ~ ReturnSec2 + ReturnSec3 + ... + ReturnSecM). The problem that I am having is that there might be dates…
Wolfgang Wu
  • 834
  • 6
  • 16
2
votes
1 answer

How do you predict outcomes from a new dataset using a model created from a different dataset in R?

I could be missing something about prediction -- but my multiple linear regression is seemingly working as expected: > bigmodel <- lm(score ~ lean + gender + age, data = mydata) > summary(bigmodel) Call: lm(formula = score ~ lean + gender + age,…
Ryan
  • 501
  • 1
  • 12
  • 26
2
votes
1 answer

Adding statsmodels 'predict' results to a Pandas dataframe

It is common to want to append the results of predictions to the dataset used to make the predictions, but the statsmodels predict function returns (non-indexed) results of a potentially different length than the dataset on which predictions are…
orome
  • 45,163
  • 57
  • 202
  • 418
2
votes
1 answer

Force step() to keep a certain valuable

I'm using step() to find a model to adjust a score based on other variables. My full model is thus : mod<-lm(Adjusted.score ~ original.score + X1 + X2 + X3 + ... + X10) It's logical that I need to keep the variable original.score in the final model…
user2568648
  • 3,001
  • 8
  • 35
  • 52
2
votes
1 answer

Use a function with a linear regression model

I can run multiple linear regressions, and in each model estimate coefficients by removing one observation from the data.frame like this: library(plyr) as.data.frame(laply(1:nrow(mtcars), function(x) coef(lm(mpg ~ hp + wt, mtcars[-x,])))) …
luciano
  • 13,158
  • 36
  • 90
  • 130
2
votes
2 answers

standard error of outcome in lm and lme

I have the following linear models library(nlme) fm2 <- lme(distance ~ age + Sex, data = Orthodont, random = ~ 1) fm2.lm <- lm(distance ~ age + Sex,data = Orthodont) How can I obtain the standard error of distance with age and Sex?
ECII
  • 10,297
  • 18
  • 80
  • 121
2
votes
0 answers

Why does regtol.int() resort my X variable in ascending order?

I'm pretty new at R, so I guess I must be doing something wrong. I have a dataset named "series" with two columns, V1=CP and V2=CU, and I want to perform a linear regression with CU as the independent variable, and then calculate tolerance intervals…
2
votes
0 answers

Pandas Rolling OLS Bug with Version 0.12.0

I have the following example data for performing a rolling OLS calculation (here I am doing it from the debugger): (Pdb) rhs ['Yield'] (Pdb) lhs 'Returns' (Pdb) min_periods 12 (Pdb) window 60 (Pdb) intercept True (Pdb) print…
ely
  • 74,674
  • 34
  • 147
  • 228
2
votes
1 answer

How to use lm function for large number of attributes

i have a dataset with 1 label attribute and 784 pixel attributes with 42000 rows like below label pixel0 pixel1 pixel2 ........... pixel783 0 1 0 0 16 . . 1 2 15 1 …
2
votes
1 answer

Performing linear regression on a log-log (base 10) plot Matlab

I have two sets of data: Peak Velocity and Amplitude. The relation between the two parameters is not linear and I used a logarithmic (base10) plot before performing linear regressions (this process is supposed to be equivalent to a power law…
Flowers
  • 59
  • 1
  • 2
  • 12
2
votes
5 answers

Gradient Descent in linear regression

I am trying to implement linear regression in java. My hypothesis is theta0 + theta1 * x[i]. I am trying to figure out the value of theta0 and theta1 so that the cost function is minimum. I am using gradient descent to find out the value - In the…
2
votes
3 answers

change null hypothesis in lmtest in R

I have a linear model generated using lm. I use the coeftest function in the package lmtest go test a hypothesis with my desired vcov from the sandwich package. The default null hypothesis is beta = 0. What if I want to test beta = 1, for example. I…
Alex
  • 19,533
  • 37
  • 126
  • 195
2
votes
1 answer

Different Python minimization functions give different values, Why?

I’m trying to learn python by rewriting Andrew Ng’s Machine learning course assignments from Octave (I took the classed and got the certificate). I’m having issues with the optimization functions. In the course they use fmincg which is a function…
Henry80s
  • 37
  • 5
2
votes
2 answers

SPSS creating a loop for a multiple regression over several variables

For my master thesis I have to use SPSS to analyse my data. Actually I thought that I don't have to deal with very difficult statistical issues, which is still true regarding the concepts of my analysis. BUT the problem is now that in order to…
1 2 3
99
100