Questions tagged [linear-regression]

for issues related to linear regression modelling approach

Linear Regression is a formalization of relationships between variables in the form of mathematical equations. It describes how one or more random variables are related to one or more other variables. Here the variables are not deterministically but stochastically related.

Example

Height and age are probabilistically distributed over humans. They are stochastically related; when you know that a person is of age 30, this influences the chance of this person being 4 feet tall. When you know that a person is of age 13, this influences the chance of this person being 6 feet tall.

Model 1

heighti = b0 + b1agei + εi, where b0 is the intercept, b1 is a parameter that age is multiplied by to get a prediction of height, ε is the error term, and i is the subject

Model 2

heighti = b0 + b1agei + b2sexi + εi, where the variable sex is dichotomous

In linear regression, user data X is modelled using linear functions Y, and unknown model parameters W are estimated or learned from the data. E.g., a linear regression model for a k-dimensional user data can be represented as :

Y = w1 x1 + w2 x2 + ... + wk xk

Reading Statistical Modeling: The Two Cultures http://projecteuclid.org/download/pdf_1/euclid.ss/1009213726

In scientific software for statistical computing and graphics, function lm (see ) implements linear regression.

6517 questions
2
votes
1 answer

How to set up balanced one-way ANOVA for lm()

I have data: dat <- data.frame(NS = c(8.56, 8.47, 6.39, 9.26, 7.98, 6.84, 9.2, 7.5), EXSM = c(7.39, 8.64, 8.54, 5.37, 9.21, 7.8, 8.2, 8), Less.5 = c(5.97, 6.77, 7.26, 5.74, 8.74, 6.3, 6.8, 7.1), …
2
votes
3 answers

biglm predict unable to allocate a vector of size xx.x MB

I have this code: library(biglm) library(ff) myData <- read.csv.ffdf(file = "myFile.csv") testData <- read.csv(file = "test.csv") form <- dependent ~ . model <- biglm(form, data=myData) predictedData <- predict(model, newdata=testData) the model…
antonio
  • 477
  • 7
  • 18
2
votes
1 answer

Identify regression sample in r

I have a general question. is there anyway that I can identify (or tag) the observations used in a regression in R? lligator = data.frame(lnLength = c(3.87, 3.61, NA, 3.43, 3.81, 3.83, 3.46, 3.76, 3.50, 3.58, 4.19, 3.78, 3.71, 3.73, 3.78),lnWeight…
Yashar
  • 23
  • 3
2
votes
4 answers

Shouldn't we take average of n models in cross validation in linear regression?

I have a question regarding cross validation in Linear regression model. From my understanding, in cross validation, we split the data into (say) 10 folds and train the data from 9 folds and the remaining folds we use for testing. We repeat this…
Binay
  • 33
  • 4
2
votes
1 answer

Linear regression gradient descent using C#

I'm taking the Coursera machine learning course right now and I cant get my gradient descent linear regression function to minimize. I use: one dependent variable, an intercept, and four values of x and y, therefore the equations are fairly simple.…
2
votes
1 answer

L2 Regularization must be added into cost function when using Linear Regression?

L2 Regularization must be added into cost function when using Linear Regression? Im not adding l2 or taking into account when computing cost. Is that wrong? The code snippet below should be sufficient : def gradient(self, X, Y, alpha,…
KenobiBastila
  • 539
  • 4
  • 16
  • 52
2
votes
1 answer

Python Linear Regression Error

I have two arrays with the following values: >>> x = [24.0, 13.0, 12.0, 22.0, 21.0, 10.0, 9.0, 12.0, 7.0, 14.0, 18.0, ... 1.0, 18.0, 15.0, 13.0, 13.0, 12.0, 19.0, 13.0] >>> y = [10.0, 9.0, 22.0, 7.0, 4.0, 7.0, 56.0, 5.0, 24.0, 25.0, 11.0,…
Chiel
  • 662
  • 1
  • 7
  • 30
2
votes
0 answers

Forecasting panel data and time series

I have a panel data set of lets say 1000 observations, so i=1,2,...,1000 . The data set runs in daily basis for a month, so t=1,2,...,31. I want to estimate individual specific in R: y_i10=αi+βi∗yi9+γi∗yi8+...+δi∗yi1+ϵit and then produce…
quant
  • 4,062
  • 5
  • 29
  • 70
2
votes
0 answers

Efficient cholesky decomposition of ABA^T given cholesky(B)

Given n*n matrices A, B, and B^1/2 (i.e. cholesky(B) ), where B is positive definite, what are efficient approaches to obtain cholesky(ABA^T) - is it possible to avoid another full Cholesky decomposition?
Charlie
  • 481
  • 4
  • 16
2
votes
1 answer

Raster linear and conditional regression using raster stacks by month in R

I have two raster stacks and I want to carry out a refression analysis. If each raster in each stack was a month in the year (6 data points would be three months in two years i.e. January, February and March for two different years), how do I…
Joke O.
  • 515
  • 6
  • 29
2
votes
0 answers

Linear regression with leastsq() and global minimum not found

In Python scipy.optimize.leastsq() is normally used for non-linear regression. However, leastsq() should in principle be expected to work with linear fitting functions also. Here appears to be a simple linear regression problem that leastsq()…
edison1093
  • 23
  • 6
2
votes
2 answers

"Force" model onto data in R? (Linear Regression)

I've been self-studying Discovering Statistics Using R by Andy Field and have come across this passage: Data splitting: This approach involves randomly splitting your data set, computing a regression equation on both halves of the data and then…
2
votes
1 answer

Calculating multiple R squared values by groups

This toy example allows me to reactively update the R squared value for two vectors I'm interested in from the mtcars dataset for linear regression. library(shiny) ui <- fluidPage( selectInput("xcol","Select X Variable",""), …
Scott
  • 311
  • 2
  • 13
2
votes
1 answer

Simple Linear Regression with constraint using Math.net and C#

I'm using Math.net and C# for simple linear regression of two double arrays (XValues, YValues) which contain physiological data. There are good grounds for constraining the intercept to the origin. At the moment I'm using: Tuple r =…
Andy Pybus
  • 21
  • 2
2
votes
1 answer

R: function returns numeric(0) but code works outside function

I am currently working with R and I'm trying to write a function that derives the partial residuals for a multiple linear model. I know that there are existing functions in R but I want to write a function myself. However, my problem is that…
Philipp
  • 57
  • 1
  • 8