Questions tagged [linear-regression]

for issues related to linear regression modelling approach

Linear Regression is a formalization of relationships between variables in the form of mathematical equations. It describes how one or more random variables are related to one or more other variables. Here the variables are not deterministically but stochastically related.

Example

Height and age are probabilistically distributed over humans. They are stochastically related; when you know that a person is of age 30, this influences the chance of this person being 4 feet tall. When you know that a person is of age 13, this influences the chance of this person being 6 feet tall.

Model 1

heighti = b0 + b1agei + εi, where b0 is the intercept, b1 is a parameter that age is multiplied by to get a prediction of height, ε is the error term, and i is the subject

Model 2

heighti = b0 + b1agei + b2sexi + εi, where the variable sex is dichotomous

In linear regression, user data X is modelled using linear functions Y, and unknown model parameters W are estimated or learned from the data. E.g., a linear regression model for a k-dimensional user data can be represented as :

Y = w1 x1 + w2 x2 + ... + wk xk

Reading Statistical Modeling: The Two Cultures http://projecteuclid.org/download/pdf_1/euclid.ss/1009213726

In scientific software for statistical computing and graphics, function lm (see ) implements linear regression.

6517 questions
2
votes
1 answer

Plotting conditional density of prediction after linear regression

This is my data frame: data <- structure(list(Y = c(NA, -1.793, -0.642, 1.189, -0.823, -1.715, 1.623, 0.964, 0.395, -3.736, -0.47, 2.366, 0.634, -0.701, -1.692, 0.155, 2.502, -2.292, 1.967, -2.326, -1.476, 1.464, 1.45, -0.797, 1.27,…
user4381526
2
votes
0 answers

Robust Linear Regression with Caret package in R

I want to fit a robust linear regression with interaction terms in Caret package in R but I obtain the following error: Error in train.default(x, y, weights = w, ...) : Stopping In addition: Warning message: In nominalTrainWorkflow(x = x, y = y,…
EanX
  • 475
  • 4
  • 21
2
votes
2 answers

Gradient descent math implementation explanation needed.

I know the solution but I don't understand how the following equation was translated to code. Why the sum is missing? Why are we transposing the ((sigmoid(X * theta)-y) expression? Solution grad = (1/m) * ((sigmoid(X * theta)-y)' * X);
2
votes
2 answers

Gradient Descent and Closed Form Solution - Different Hypothesis Lines in MATLAB

I'm in the process of coding what I'm learning about Linear Regression from the coursera Machine Learning course (MATLAB). There was a similar post that I found here, but I don't seem to be able to understand everything. Perhaps because my…
2
votes
1 answer

lmPerm P-Values Different depending on Order of Coefficients

I am getting different results from lmPerm based on the order in which I enter the variables in the function call. For example, placing NCF.pf before TotalProperties yields the following: pfit <- lmp(NetCashOps ~ NCF.pf + TotalProperties, data =…
Rymatt830
  • 129
  • 9
2
votes
1 answer

Get solution to overdetermined linear homogeneous system numpy

I'm trying to find the solution to overdetermined linear homogeneous system (Ax = 0) using numpy in order to get the least linear squares solution for a linear regression. This is the code I am using to generate the linear regression: N = 100 x_data…
sevolo
  • 21
  • 3
2
votes
1 answer

post-hoc test for one-way ANOVA with random effect

I have a continuous response variable yld and a categorical predictor check (with 3 levels). I did an one-way ANOVA and a post-hoc test to see which levels differ from each other. mdl<-aov(sqrt(var$yld) ~ var$check); summary(mdl);TukeyHSD(mdl) …
user53020
  • 889
  • 2
  • 10
  • 33
2
votes
1 answer

How to calculate mean values from a linear model in R?

I'm working with a dataset on conservation and its influence on biomass, in which fifty plots of land, each one hectare, were sampled at random from a ten thousand hectare area in Northern England. For each plot of land, the following variables…
hsmith
  • 35
  • 1
  • 1
  • 7
2
votes
1 answer

Basic questions about linear regression example from NVIDIA DIGITS

I've a lot of values from all days over one entire year. I'm wanna verify if they have a kind of similarity for each month (verify if these days values correspond to the correct month and/or predict for future same months from another future year).…
2
votes
0 answers

R: can I get regsubsets() to in-/exclude variables by groups?

I'm working with a data frame containing a lot of indicator variables that I made from categorical variables using dummy(). When using regsubsets (from the leaps package), is there a way to make it include these indicators by group, not…
MissMonicaE
  • 709
  • 1
  • 8
  • 15
2
votes
1 answer

Replicate a regression using a random subset of data each time and check distribution of regression coefficients?

I'm working with a dataset comprising of cars' prices, brands, mileage etc. I want the coefficient of my distribution for the regression of my independent variable (mileage) against price by running my regression 2,000 times, and by sampling 300…
HP-Nunes
  • 111
  • 1
  • 11
2
votes
1 answer

Is there a fast estimation of simple regression (a regression line with only intercept and slope)?

This question relates to a machine learning feature selection procedure. I have a large matrix of features - columns are the features of the subjects (rows): set.seed(1) features.mat <- matrix(rnorm(10*100),ncol=100) colnames(features.mat) <-…
dan
  • 6,048
  • 10
  • 57
  • 125
2
votes
1 answer

How to conduct linear hypothesis test on regression coefficients with a clustered covariance matrix?

I am interested in calculating estimates and standard errors for linear combinations of coefficients after a linear regression in R. For example, suppose I have the regression and test: data(mtcars) library(multcomp) lm1 <- lm(mpg ~ cyl + hp, data…
Ralph M
  • 23
  • 4
2
votes
3 answers

Standarized residuals in SPSS not maching R rstandard(lm())

While looking for a R related solution I found some inconsistency between R and SPSS (ver. 24) in computing standardized residuals in a simple linear model. It appears that what SPSS calls standarized residuals matches R studentized residuals I'm…
blazej
  • 1,678
  • 3
  • 19
  • 41
2
votes
5 answers

constrained linear regression / quadratic programming python

I have a dataset like this: import numpy as np a = np.array([1.2, 2.3, 4.2]) b = np.array([1, 5, 6]) c = np.array([5.4, 6.2, 1.9]) m = np.vstack([a,b,c]) y = np.array([5.3, 0.9, 5.6]) and want to fit a constrained linear regression y = b1*a +…
spore234
  • 3,550
  • 6
  • 50
  • 76