Questions tagged [linear-regression]

for issues related to linear regression modelling approach

Linear Regression is a formalization of relationships between variables in the form of mathematical equations. It describes how one or more random variables are related to one or more other variables. Here the variables are not deterministically but stochastically related.

Example

Height and age are probabilistically distributed over humans. They are stochastically related; when you know that a person is of age 30, this influences the chance of this person being 4 feet tall. When you know that a person is of age 13, this influences the chance of this person being 6 feet tall.

Model 1

heighti = b0 + b1agei + εi, where b0 is the intercept, b1 is a parameter that age is multiplied by to get a prediction of height, ε is the error term, and i is the subject

Model 2

heighti = b0 + b1agei + b2sexi + εi, where the variable sex is dichotomous

In linear regression, user data X is modelled using linear functions Y, and unknown model parameters W are estimated or learned from the data. E.g., a linear regression model for a k-dimensional user data can be represented as :

Y = w1 x1 + w2 x2 + ... + wk xk

Reading Statistical Modeling: The Two Cultures http://projecteuclid.org/download/pdf_1/euclid.ss/1009213726

In scientific software for statistical computing and graphics, function lm (see ) implements linear regression.

6517 questions
11
votes
7 answers

Simple prediction using linear regression with python

data2 = pd.DataFrame(data1['kwh']) data2 kwh date 2012-04-12 14:56:50 1.256400 2012-04-12 15:11:55 1.430750 2012-04-12 15:27:01 1.369910 2012-04-12 15:42:06 1.359350 2012-04-12 15:57:10 …
Jimmys
  • 357
  • 1
  • 3
  • 14
11
votes
2 answers

Is Apache Spark less accurate than Scikit Learn?

I've recently been trying to get to know Apache Spark as a replacement for Scikit Learn, however it seems to me that even in simple cases, Scikit converges to an accurate model far faster than Spark does. For example I generated 1000 data points for…
11
votes
5 answers

Linear Regression and storing results in data frame

I am running a linear regression on some variables in a data frame. I'd like to be able to subset the linear regressions by a categorical variable, run the linear regression for each categorical variable, and then store the t-stats in a data frame.…
Trexion Kameha
  • 3,362
  • 10
  • 34
  • 60
11
votes
2 answers

Model matrix with all pairwise interactions between columns

Let's say that I have a numeric data matrix with columns w, x, y, z and I also want to add in the columns that are equivalent to w*x, w*y, w*z, x*y, x*z, y*z since I want my covariate matrix to include all pairwise interactions. Is there a clean and…
encircled
  • 159
  • 1
  • 1
  • 7
11
votes
4 answers

Why are LASSO in sklearn (python) and matlab statistical package different?

I am using LaasoCV from sklearn to select the best model is selected by cross-validation. I found that the cross validation gives different result if I use sklearn or matlab statistical toolbox. I used matlab and replicate the example given in…
imsc
  • 7,492
  • 7
  • 47
  • 69
11
votes
5 answers

Why does lm run out of memory while matrix multiplication works fine for coefficients?

I am trying to do fixed effects linear regression with R. My data looks like dte yr id v1 v2 . . . . . . . . . . . . . . . I then decided to simply do this by making yr a factor and use lm: lm(v1 ~…
Alex
  • 19,533
  • 37
  • 126
  • 195
10
votes
3 answers

Ruby Library for doing Linear or NonLinear Least Squares Approximation?

Is there a Ruby library that allows me to do either linear or non-linear least squares approximation of a set of data. What I would like to do is the following: Given a series of [x,y] data points Generate a linear or non linear least squares…
Peter C
  • 101
  • 1
  • 4
10
votes
2 answers

How to add "greater than 0 and sums to 1" constraint to a regression in Python?

I am using statsmodels (open to other python options) to run some linear regression. My problem is that I need the regression to have no intercept and constraint the coefficients in the range (0,1) and also sum to 1. I tried something like this (for…
amaatouq
  • 2,297
  • 5
  • 29
  • 50
10
votes
1 answer

perform Deming regression without intercept

I would like to perform Deming regression (or any equivalent of a regression method with uncertainties in both X and Y variables, such as York regression). In my application, I have a very good scientific justification to deliberately set the…
agenis
  • 8,069
  • 5
  • 53
  • 102
10
votes
3 answers

Multiple Linear Regression in Power BI

Suppose I have a set of returns and I want to compute its beta values versus different market indices. Let's use the following set of data in a table named Returns for the sake of having a concrete example: Date Equity Duration Credit …
Alexis Olson
  • 38,724
  • 7
  • 42
  • 64
10
votes
3 answers

Python/Matplotlib: adding regression line to a plot given its intercept and slope

Using the following small dataset: bill = [34,108,64,88,99,51] tip = [5,17,11,8,14,5] I calculated a best-fit regression line (by hand). yi = 0.1462*x - 0.8188 #yi = slope(x) + intercept I've plotted my original data using Matplotlib like…
Beatdown
  • 187
  • 2
  • 7
  • 20
10
votes
2 answers

How to correctly `dput` a fitted linear model (by `lm`) to an ASCII file and recreate it later?

I want to persist a lm object to a file and reload it into another program. I know I can do this by writing/reading a binary file via saveRDS/readRDS, but I'd like to have an ASCII file instead of a binary file. At a more general level, I'd like…
mpettis
  • 3,222
  • 4
  • 28
  • 35
10
votes
1 answer

R - Calculate Test MSE given a trained model from a training set and a test set

Given two simple sets of data: head(training_set) x y 1 1 2.167512 2 2 4.684017 3 3 3.702477 4 4 9.417312 5 5 9.424831 6 6 13.090983 head(test_set) x y 1 1 2.068663 2 2 4.162103 …
Jebathon
  • 4,310
  • 14
  • 57
  • 108
10
votes
1 answer

linear model with `lm`: how to get prediction variance of sum of predicted values

I'm summing the predicted values from a linear model with multiple predictors, as in the example below, and want to calculate the combined variance, standard error and possibly confidence intervals for this sum. lm.tree <- lm(Volume ~ poly(Girth,2),…
CCID
  • 1,368
  • 5
  • 19
  • 35
10
votes
2 answers

plotly regression line R

Problem with adding a regression line to a 'plotly' scatter plot. I've done the following code: require(plotly) data(airquality) ## Scatter plot ## c <- plot_ly(data = airquality, x = Wind, y = Ozone, type = "scatter", mode =…
Monteiro
  • 101
  • 1
  • 1
  • 3