Questions tagged [linear-regression]

for issues related to linear regression modelling approach

Linear Regression is a formalization of relationships between variables in the form of mathematical equations. It describes how one or more random variables are related to one or more other variables. Here the variables are not deterministically but stochastically related.

Example

Height and age are probabilistically distributed over humans. They are stochastically related; when you know that a person is of age 30, this influences the chance of this person being 4 feet tall. When you know that a person is of age 13, this influences the chance of this person being 6 feet tall.

Model 1

heighti = b0 + b1agei + εi, where b0 is the intercept, b1 is a parameter that age is multiplied by to get a prediction of height, ε is the error term, and i is the subject

Model 2

heighti = b0 + b1agei + b2sexi + εi, where the variable sex is dichotomous

In linear regression, user data X is modelled using linear functions Y, and unknown model parameters W are estimated or learned from the data. E.g., a linear regression model for a k-dimensional user data can be represented as :

Y = w1 x1 + w2 x2 + ... + wk xk

Reading Statistical Modeling: The Two Cultures http://projecteuclid.org/download/pdf_1/euclid.ss/1009213726

In scientific software for statistical computing and graphics, function lm (see ) implements linear regression.

6517 questions
2
votes
1 answer

Creating Specific linear regression equations from A larger equation using R

here is a sample of my data, which is found at this link: http://www.uwyo.edu/crawford/datasets/drugreactions.txt I made this equation for the data fit2 <- lm(Allergens~Gender*Race*Druglevel, data=dr) Which spit me out this I know how to…
Maxwell Chandler
  • 626
  • 8
  • 18
2
votes
1 answer

predict() in pandas statsmodels, adding independent variables

Data: https://courses.edx.org/c4x/MITx/15.071x_2/asset/climate_change.csv I'm building a multiple linear regression model with pandas: import pandas as pd import statsmodels.api as sm climate = pd.read_csv("climate_change.csv") climate_train =…
alkamid
  • 6,970
  • 4
  • 28
  • 39
2
votes
1 answer

Pandas with Fixed Effects

I'm using Pandas on Python 2.7. I have data with the following columns: State, Year, UnempRate, Wage I'm teaching a course on how to use Python for research. As the culmination of our project, I want to run a regression of UnempRate on Wage…
user3674422
  • 51
  • 2
  • 5
2
votes
3 answers

Computing statistical tests for linear regression in R

I am new to Stack Overflow and I am also new to R and statistics. I need to create a linear regression model to describe the weight of a car based on some variables in a given dataset. wtlm=lm(weight~foreign + cylinders + displacement + hp +…
Kim
  • 21
  • 2
2
votes
1 answer

Statsmodels - Wald Test for significance of trend in coefficients in Linear Regression Model (OLS)

I have used Statsmodels to generate a OLS linear regression model to predict a dependent variable based on about 10 independent variables. The independent variables are all categorical. I am interested in looking closer at the significance of the…
JHawkins
  • 47
  • 2
  • 6
2
votes
1 answer

R's sandwich package producing strange results for robust standard errors in linear model

I am trying to find heteroskedasticity-robust standard errors in R, and most solutions I find are to use the coeftest and sandwich packages. However, when I use those packages, they seem to produce queer results (they're way too significant). Both…
cgmil
  • 410
  • 2
  • 18
2
votes
0 answers

Multivariate Multiple Regression in Python

I am trying to a perform a multivariate multiple linear regression, so I have multiple inputs and outputs that I am trying to optimize for. I would like to do this in python. Are than any software's that do this. I looked into sci-kit learn and the…
2
votes
2 answers

Calculating the slope of each row in a large data set using R

I have a large data set of the following format: First column is type, and the subsequent columns are different times that 'type' happens. I want to calculate the slope of each row (~7000 rows) for subset T0-T2 and then t0-t2 and output that…
Anita
  • 45
  • 2
  • 4
2
votes
1 answer

R Variance Inflation Factors - Warning : No function found corresponding to methods exports from ‘SparseM’ for: ‘coerce’

I am playing around with the car library for R and have encountered the following warning after calling the variance_inflation_factors function on my data model. No function found corresponding to methods exports from ‘SparseM’ for: …
FinnM
  • 394
  • 1
  • 3
  • 17
2
votes
1 answer

specify model with selected terms using lm

A pretty straightforward for those with intimate knowledge of R full <- lm(hello~., hellow) In the above specification, linear regression is being used and hello is being modeled against all variables in dataset hellow. I have 33 variables in…
oivemaria
  • 453
  • 4
  • 20
2
votes
0 answers

R regressions in a loop

I have an excel files with 12 columns. I need to regress six of these on one column (i.e. six univariate linear regressions.) I would like to write a loop which does the regressions, and then store all the summary statistics (intercept, beta, R^2,…
claired
  • 21
  • 2
2
votes
1 answer

NaNs produced when plotting a linear model (lm) with R

I am trying to create a normal regression model and a logistic one to predict fraud in real state data. I work with a mixed data set (categorical and numerical variables) where I have done the pre-processing and recoding so that I had balanced…
NuValue
  • 453
  • 3
  • 11
  • 28
2
votes
1 answer

CVlm with categorical variables: factor has new levels

I am using lm for MLR and CVlm for cross-validation. My data contains two categorical variables (one of them with 11 levels and the other one with only 2). Everything seems to work fine when using lm, the problem is when I try to use CVlm. I have…
user3231352
  • 799
  • 1
  • 9
  • 26
2
votes
1 answer

R: Selecting every two consecutive rows for ddplyr

This is my data Assay Sample Dilution meanresp number 1 S 0.25 68.55 1 1 S 0.50 54.35 2 1 S 1.00 44.75 3 My end goal is to apply a linear regression to every two consecutive rows…
Kabau
  • 79
  • 2
  • 8
2
votes
1 answer

How to regress Y on X using matlab?

Given : Y=[81 55 80 24 78 52 88 45 50 69 66 45 24 43 38 72 41 48 52 52 66 89]; X=[124 49 181 4 22 152 75 54 43 41 17 22 16 10 63 170 125 15 222 171 97 254]; I want to regress Y on X (simple linear regression). I tried with this code : b=…
blackbishop
  • 30,945
  • 11
  • 55
  • 76