Questions tagged [linear-regression]

for issues related to linear regression modelling approach

Linear Regression is a formalization of relationships between variables in the form of mathematical equations. It describes how one or more random variables are related to one or more other variables. Here the variables are not deterministically but stochastically related.

Example

Height and age are probabilistically distributed over humans. They are stochastically related; when you know that a person is of age 30, this influences the chance of this person being 4 feet tall. When you know that a person is of age 13, this influences the chance of this person being 6 feet tall.

Model 1

heighti = b0 + b1agei + εi, where b0 is the intercept, b1 is a parameter that age is multiplied by to get a prediction of height, ε is the error term, and i is the subject

Model 2

heighti = b0 + b1agei + b2sexi + εi, where the variable sex is dichotomous

In linear regression, user data X is modelled using linear functions Y, and unknown model parameters W are estimated or learned from the data. E.g., a linear regression model for a k-dimensional user data can be represented as :

Y = w1 x1 + w2 x2 + ... + wk xk

Reading Statistical Modeling: The Two Cultures http://projecteuclid.org/download/pdf_1/euclid.ss/1009213726

In scientific software for statistical computing and graphics, function lm (see ) implements linear regression.

6517 questions
2
votes
1 answer

Best way to make a linear regression model from a split .csv dataset?

I'm generally quite new to Python, and I'm having trouble making a linear regression model. I need to make it from a training and test set from a large excel dataset (.csv). I've split the dataset already: import pandas as pd import numpy as np df…
2
votes
0 answers

What is the easiest way to get uncertainties on linear regression parameters?

I used to run the DROITEREG functions in a calc sheet. Here is an example : At the top left, there are the data and at the bottom the results of the function DROITEREG which are a 2 by 5 table. I wrote the labels of several cells. a and b are the…
Ger
  • 9,076
  • 10
  • 37
  • 48
2
votes
2 answers

Finding the slope trend from best fit lines

I am trying to figure out how to determine the slope trend from best fit lines that have points. Basically, once I have the trend in the slope, I want to plot multiple other lines with that trend in the same plot. For example: This plot is…
Cosmoman
  • 101
  • 1
  • 12
2
votes
0 answers

Speeding up the felm command in R (lfe library)

I am using the felm from the lfe library, and am running into serious speed issues when using a large data set. By large I mean 100 million rows. My data consists of one dependent variable and five categorical variables (factors). I am running…
splinter
  • 3,727
  • 8
  • 37
  • 82
2
votes
1 answer

FGLS Correcting Serial Correlation and heteroskedasticity with plm package in R

I am running a regression model with some heteroskedasticity and serial correlation and I am trying to solve both without changing my model specification. First, I have generated an OLS model and realized both problems, heteroskedasticity and…
2
votes
0 answers

Better way to save and load Linear model coefficients for prediction

I am working to make hourly coefficients for 50000 customers from one year data set. (365 rows*28 columns) I want to save these coefficients for prediction later in another R code file. Currently, I am saving a list of 24 hourly models using save…
Sripati
  • 71
  • 2
2
votes
1 answer

R: build separate models for each category

Short version: How to build separate models for each category (without splitting the data). (I am new to R) Long version: consider the following synthetic…
knightrider
  • 2,063
  • 1
  • 16
  • 29
2
votes
1 answer

Four-part formula syntax in R

I am using the lfe package for high dimenaional fixed effects in R. I am having trouble when trying to run with no covariates. That is, only with fixed effects. My code is: library(lfe) data=read.csv("path_to//my_data.csv") y <- cbind(col1) x <-…
splinter
  • 3,727
  • 8
  • 37
  • 82
2
votes
1 answer

Calculating MSE: why are these two ways giving different results?

I am having some doubt regarding the calculation of MSE in R. I have tried two different ways and I am getting two different results. Wanted to know which one is the correct way of finding mse. First: model1 <- lm(data=d, x ~ y) rmse_model1 <-…
Julius Knafl
  • 429
  • 3
  • 14
2
votes
1 answer

Why does arm::standardize() fail to work on a lm object in a loop?

standardize() in the arm package fails for me when I define the formula object using as.formula and use it in lm(formula, data = df). Option A (which I don't want) standardizes inputs outside of lm. Option B tries (and fails) to standardize the lm…
Eric Green
  • 7,385
  • 11
  • 56
  • 102
2
votes
1 answer

In R, when fitting a regression with ordinal predictor variables, how do you suppress one of the polynomial contrast levels?

Below is some of the summary data from a mixed model I have run in R (produced by summary()): Fixed effects: Estimate Std. Error df t value Pr(>|t|) (Intercept) -3.295e-01 1.227e-01 3.740e+01 -2.683 0.0108…
Bajcz
  • 433
  • 5
  • 20
2
votes
0 answers

Linear regression model up to nth power of number

I know, that when I'm using lm() or glm() function to fit the regression model in R, it's possible to write interactions up to n-th degree like this: fit <- glm(formula=outVar ~ (inVar1 + inVar2 + inVar3)^n, data=d) But is it possible to…
Eenoku
  • 2,741
  • 4
  • 32
  • 64
2
votes
1 answer

Gradient descent on linear regression not converging

I have implemented a very simple linear regression with gradient descent algorithm in JavaScript, but after consulting multiple sources and trying several things, I cannot get it to converge. The data is absolutely linear, it's just the numbers 0 to…
Alpha
  • 7,586
  • 8
  • 59
  • 92
2
votes
0 answers

Predict sentiment score using multiclass logistic regression with R

I am trying to create a sentiment analysis classifier using logistic regression with R (glmnet).. Here is the R code : library(tidyverse) library(text2vec) library(caret) library(glmnet) library(ggrepel) Train_classifier <-…
2
votes
1 answer

Troubles with predict() function (probably easy to solve)

all. This is the first question I make in this forum. I'am a beginner, as you all will immediately tell. I´m doing a small task in which I must compare a training model with a test model. The point is that the training model has much more rows than…
albert
  • 37
  • 1
  • 4