Questions tagged [linear-regression]

for issues related to linear regression modelling approach

Linear Regression is a formalization of relationships between variables in the form of mathematical equations. It describes how one or more random variables are related to one or more other variables. Here the variables are not deterministically but stochastically related.

Example

Height and age are probabilistically distributed over humans. They are stochastically related; when you know that a person is of age 30, this influences the chance of this person being 4 feet tall. When you know that a person is of age 13, this influences the chance of this person being 6 feet tall.

Model 1

heighti = b0 + b1agei + εi, where b0 is the intercept, b1 is a parameter that age is multiplied by to get a prediction of height, ε is the error term, and i is the subject

Model 2

heighti = b0 + b1agei + b2sexi + εi, where the variable sex is dichotomous

In linear regression, user data X is modelled using linear functions Y, and unknown model parameters W are estimated or learned from the data. E.g., a linear regression model for a k-dimensional user data can be represented as :

Y = w1 x1 + w2 x2 + ... + wk xk

Reading Statistical Modeling: The Two Cultures http://projecteuclid.org/download/pdf_1/euclid.ss/1009213726

In scientific software for statistical computing and graphics, function lm (see ) implements linear regression.

6517 questions
9
votes
2 answers

ValueError: Found input variables with inconsistent numbers of samples: [2750, 1095]

It would be really helpful if someone could help me understand this error and what do I do to fix it? I cannot change my data. X = train[['id', 'listing_type', 'floor', 'latitude', 'longitude', 'beds',…
9
votes
1 answer

How does R handle ordinal predictors in lm()?

As I understand it, when you fit a linear model in R using a nominal predictor, R essentially uses dummy 1/0 variables for each level (except the reference level), and then giving a regular old coefficient for each of these variables. What does it…
MissMonicaE
  • 709
  • 1
  • 8
  • 15
9
votes
3 answers

Getting 'ValueError: shapes not aligned' on SciKit Linear Regression

Quite new to SciKit and linear algebra/machine learning with Python in general, so I can't seem to solve the following: I have a training set and a test set of data, containing both continuous and discrete/categorical values. The CSV files are…
Koen
  • 947
  • 1
  • 10
  • 15
9
votes
1 answer

Spark load model and continue training

I'm using Scala with Spark 2.0 to train a model with LinearRegression. val lr = new LinearRegression() .setMaxIter(num_iter) .setRegParam(reg) .setStandardization(true) val model = lr.fit(data) this is working fine and I get good results. I…
Silu
  • 176
  • 1
  • 7
9
votes
1 answer

Multi-variable linear regression with scipy linregress

I'm trying to train a very simple linear regression model. My code is: from scipy import stats xs = [[ 0, 1, 153] [ 1, 2, 0] [ 2, 3, 125] [ 3, 1, 93] [ 2, 24, 5851] [ 3, 1, 524] [ 4, 1, 0] [ 2,…
jbrown
  • 7,518
  • 16
  • 69
  • 117
9
votes
1 answer

Spark ml and PMML export

I know that it's possible to export models as PMML with Spark-MLlib, but what about Spark-ML? Is it possible to convert LinearRegressionModel from org.apache.spark.ml.regression to a LinearRegressionModel from org.apache.spark.mllib.regression to be…
philippe
  • 121
  • 1
  • 6
9
votes
0 answers

Bayesian error-in-variables (total least squares) model in R using MCMCglmm

I am fitting some Bayesian linear mixed models using the MCMCglmm package in R. My data includes predictors that are measured with error. I'd therefore like to build a model that takes this into account. My understanding is that a basic mixed…
Alberto
  • 133
  • 5
9
votes
1 answer

Feature mapping using multi-variable polynomial

Consider we have a data-matrix of data points and we are interested to map those data points into a higher dimensional feature space. We can do this by using d-degree polynomials. Thus for a sequence of data points the new data-matrix is I have…
Thoth
  • 993
  • 12
  • 36
9
votes
2 answers

Normalization in sci-kit learn linear_models

If the normalization parameter is set to True in any of the linear models in sklearn.linear_model, is normalization applied during the score step? For example: from sklearn import linear_model from sklearn.datasets import load_boston a =…
mgoldwasser
  • 14,558
  • 15
  • 79
  • 103
9
votes
2 answers

Return std and confidence intervals for out-of-sample prediction in StatsModels

I'd like to find the standard deviation and confidence intervals for an out-of-sample prediction from an OLS model. This question is similar to Confidence intervals for model prediction, but with an explicit focus on using out-of-sample data. The…
9
votes
2 answers

Does scikit-learn perform "real" multivariate regression (multiple dependent variables)?

I would like to predict multiple dependent variables using multiple predictors. If I understood correctly, in principle one could make a bunch of linear regression models that each predict one dependent variable, but if the dependent variables are…
9
votes
2 answers

Weights with plm package

My data frame looks like something as follows: unique.groups<- letters[1:5] unique_timez<- 1:20 groups<- rep(unique.groups, each=20) my.times<-rep(unique_timez, 5) play.data<- data.frame(groups, my.times, y= rnorm(100), x=rnorm(100), POP= 1:100) I…
Zslice
  • 412
  • 1
  • 5
  • 14
9
votes
1 answer

Numpy linear regression with regularization

I'm not seeing what is wrong with my code for regularized linear regression. Unregularized I have simply this, which I'm reasonably certain is correct: import numpy as np def get_model(features, labels): return…
Marshall Farrier
  • 947
  • 2
  • 11
  • 20
9
votes
2 answers

D3.js linear regression

I searched for some help on building linear regression and found some examples here: nonlinear regression function and also some js libraries that should cover this, but unfortunately I wasn't able to make them work properly: simple-statistics.js…
tomtomtom
  • 1,502
  • 1
  • 18
  • 27
9
votes
1 answer

Efficient 1D linear regression for each element of 3D numpy array

I have 3D stacks of masked arrays. I'd like to perform a linear regression for values at each row,col (spatial index) along axis 0 (time). The dimensions of these stacks varies, but a typical shape might be (50, 2000, 2000). My spatially-limited…
David Shean
  • 1,015
  • 1
  • 9
  • 11