Questions tagged [linear-regression]

for issues related to linear regression modelling approach

Linear Regression is a formalization of relationships between variables in the form of mathematical equations. It describes how one or more random variables are related to one or more other variables. Here the variables are not deterministically but stochastically related.

Example

Height and age are probabilistically distributed over humans. They are stochastically related; when you know that a person is of age 30, this influences the chance of this person being 4 feet tall. When you know that a person is of age 13, this influences the chance of this person being 6 feet tall.

Model 1

heighti = b0 + b1agei + εi, where b0 is the intercept, b1 is a parameter that age is multiplied by to get a prediction of height, ε is the error term, and i is the subject

Model 2

heighti = b0 + b1agei + b2sexi + εi, where the variable sex is dichotomous

In linear regression, user data X is modelled using linear functions Y, and unknown model parameters W are estimated or learned from the data. E.g., a linear regression model for a k-dimensional user data can be represented as :

Y = w1 x1 + w2 x2 + ... + wk xk

Reading Statistical Modeling: The Two Cultures http://projecteuclid.org/download/pdf_1/euclid.ss/1009213726

In scientific software for statistical computing and graphics, function lm (see ) implements linear regression.

6517 questions
2
votes
2 answers

Different costs for underestimation and overestimation

I have a regression problem, but the cost function is different: The cost for an underestimate is higher than an overestimate. For example, if predicted value < true value, the cost will be 3*(true-predicted)^2; if predicted value > true value, the…
2
votes
2 answers

Segmented linear regression with discontinuous data

I have a dataset that looks to be piecewise linear. I would like to perform a segmented linear regression in R. The issue is that there is a discontinuity at the breakpoint. By using some pieces of code from this question I managed to get something,…
Tom Cornebize
  • 1,362
  • 15
  • 33
2
votes
1 answer

TensorFlow / TFLearn LinearRegressor stops with a very high loss

I am using Tensorflow 1.2, here's the code: import tensorflow as tf import tensorflow.contrib.layers as layers import numpy as np import tensorflow.contrib.learn as tflearn tf.logging.set_verbosity(tf.logging.INFO) # Naturally this is a very…
sgzmd
  • 291
  • 1
  • 4
  • 10
2
votes
1 answer

How to predict a new value using simple linear regression log(y)=b0+b1*log(x)

How to predict a new given value of body using the ml2 model below, and interpret its output (new predicted output only, not model) Using Animals dataset from MASS package to build a simple linear regression…
Tuyen
  • 977
  • 1
  • 8
  • 23
2
votes
2 answers

how to interpret coefficients in log-log market mix model

I am running a multivariate OLS regression as below using weekly sales and media data. I would like to understand how to calculate the sales contribution when doing log transforms like log-linear, linear-log and log-log. For example: Volume_Sales =…
A.M. Das
  • 63
  • 6
2
votes
2 answers

Bayesian vs OLS

I found this question online. Can someone explain in details please, why using OLS is better? Is it only because the number of samples is not enough? Also, why not use all the 1000 samples to estimate the prior distribution? We have 1000 randomly…
Erin
  • 177
  • 3
  • 14
2
votes
1 answer

LMS batch gradient descent with NumPy

I'm trying to write some very simple LMS batch gradient descent but I believe I'm doing something wrong with the gradient. The ratio between the order of magnitude and the initial values for theta is very different for the elements of theta so…
2
votes
1 answer

Function for out of sample testing a linear model

Can anyone recommend a function in R to me with which i can calculate the Out of Sample R-squared of a previously calculated linear model lm(). Regards and thanks in advance!
2
votes
2 answers

tensorflow linear regression error blows up

I am trying to fit a very simple linear regression model using tensorflow. However, the loss (mean squared error) blows up instead of reducing to zero. First, I generate my data: x_data = np.random.uniform(high=10,low=0,size=100) y_data = 3.5 *…
2
votes
1 answer

SARIMAX model out of sample prediction

i'm working on SARIMAX model to predict stock market in python. I divided the data to training and testing data. After fitting my model on the training data, my goal is to predict the testing data (one step prediction) When i add exogs to the model,…
2
votes
1 answer

Python exponential/linear curve fitting

This question is less about programming than it is about mathematics, but I would like some opinions. I'm trying to model the exponential decay behavior of this curve but as you can see there is a certain level of fluctuations/noise at the lower…
2
votes
2 answers

PanelOLS pandas linearmodels documentation

does anybody know where I can find the full documentation regarding the PanelOLS from Pandas (from pandas.stats.plm import PanelOLS) and PanelOLS from Linearmodels (from linearmodels import PanelOLS)?
Valerio
  • 101
  • 2
  • 10
2
votes
2 answers

Understanding why linear regression isn't treating my categorical variable as expected?

I'm reading about formulas and linear regression, and I'm having trouble understanding how to interpret the output of lm for a linear regression with multiple parameters and categorical variables. I think I understand how to interpret the output for…
Ben Rubin
  • 6,909
  • 7
  • 35
  • 82
2
votes
1 answer

How to make lm ignore NA columns

I'm trying to calculate a bunch of betas. Unfortunately, sometimes some of the columns are all NA. Here's a toy example: x = structure(c(0.946032318625641, -0.472255854964591, -0.570914946839299, -0.624246840976067, -0.484359645048786,…
lebelinoz
  • 4,890
  • 10
  • 33
  • 56
2
votes
1 answer

Recursive Feature Elimination with LinearRegression Python

So I'm working on a project that is using RFECV for feature selection and then doing ridge regression with the selected variables. The way the data set is structured I have a train_y = dependent variable, train_x = everything else in the data frame…
mswhitehead
  • 79
  • 1
  • 7