Questions tagged [linear-regression]

for issues related to linear regression modelling approach

Linear Regression is a formalization of relationships between variables in the form of mathematical equations. It describes how one or more random variables are related to one or more other variables. Here the variables are not deterministically but stochastically related.

Example

Height and age are probabilistically distributed over humans. They are stochastically related; when you know that a person is of age 30, this influences the chance of this person being 4 feet tall. When you know that a person is of age 13, this influences the chance of this person being 6 feet tall.

Model 1

heighti = b0 + b1agei + εi, where b0 is the intercept, b1 is a parameter that age is multiplied by to get a prediction of height, ε is the error term, and i is the subject

Model 2

heighti = b0 + b1agei + b2sexi + εi, where the variable sex is dichotomous

In linear regression, user data X is modelled using linear functions Y, and unknown model parameters W are estimated or learned from the data. E.g., a linear regression model for a k-dimensional user data can be represented as :

Y = w1 x1 + w2 x2 + ... + wk xk

Reading Statistical Modeling: The Two Cultures http://projecteuclid.org/download/pdf_1/euclid.ss/1009213726

In scientific software for statistical computing and graphics, function lm (see ) implements linear regression.

6517 questions
2
votes
1 answer

Relationship between sklearn .fit() and .score()

While working with a linear regression model I split the data into a training set and test set. I then calculated R^2, RMSE, and MAE using the following: lm.fit(X_train, y_train) R2 = lm.score(X,y) y_pred = lm.predict(X_test) RMSE =…
Swede
  • 23
  • 1
  • 3
2
votes
1 answer

Wrong intercept in Spark linear regression

I am starting with Spark Linear Regression. I am trying to fit a line to a linear dataset. It seems that the intercept is not correctly adjusting, or probably I am missing something.. With intercept=False: linear_model =…
javierdvalle
  • 2,473
  • 1
  • 13
  • 15
2
votes
1 answer

Multivariate Linear Regression in Python - analog of mvregress in MATLAB?

I want to use the same function or method in Python as mvregress in MATLAB. As an example, we have x1, x2, x3, x4, x5, x6 inputs and y1, y2, y3 outputs. After using this function we should get some estimate regression coefficients. Does Python have…
A. Innokentiev
  • 681
  • 2
  • 11
  • 27
2
votes
1 answer

Identify weakest feature in classification

A basic machine learning exercise is to perform a regression on some data. For instance, estimate the length of a fish as a function of weight and age. This is often done by having a large training data set (weight, age, length) and then apply some…
2
votes
1 answer

How to fit with a broken line in only one of two dependent variables?

Using the mtcars data set, I am trying to determine the broken line regression fit of mpg as a function of hp and wt, with breakpoints coming only from hp. Here is the code: mpg = mtcars$mpg wt = mtcars$wt hp = mtcars$hp reg = lm (mpg ~ hp…
pulp_fiction
  • 185
  • 1
  • 12
2
votes
3 answers

How to obtain adjusted dependent variables

Given the following dataset: csf age sex tiv group 0,30 7,92 1 1,66 1 0,26 33,75 0 1,27 3 0,18 7,83 0 1,43 2 0,20 9,42 0 1,70 1 0,29 22,33 1 1,68 2 0,40 20,75 1 1,56 1 0,26 …
Borja
  • 63
  • 2
  • 6
2
votes
1 answer

Gradient descent for linear regression takes too long to converge

I began to study machine learning and stuck on one issue. My implementation of this method (both in MATLAB and C++) converge in 1 500 000 iterations, and I can not understand why. I found the method implementation in Python, and the algorithm…
2
votes
1 answer

predict vector values instead of single output

In linear regression I've always seen the situation where I have many features and I use them to predict a single output, for example f1 f2 f3 f4 --> y1 f1 f2 f3 f4 --> y2 and so on... I want to know if there is something where the predicted value…
Exorcismus
  • 2,243
  • 1
  • 35
  • 68
2
votes
1 answer

sklearn variance for Linear Regression prediction

I am trying to fit a Linear model using LinearRegression from scikit. From the predict function, I get a point estimate prediction, but I need a distribution of the possible value with probably the point value from predict being the mean of a…
Fayaz Ahmed
  • 953
  • 1
  • 9
  • 23
2
votes
1 answer

How to fix .predict() function in statsmodels?

I'm trying to predict temperature at 12 UTC tomorrow in 1 location. To forecast, I use a basic linear regression model with the statmodels module. My code is hereafter: x = ds_main X = sm.add_constant(x) y = ds_target_t model =…
florian
  • 881
  • 2
  • 8
  • 24
2
votes
1 answer

Removing Bonferroni Outlier Test Results in a loop

I modeled my data using linear regression. I want to run Bonferroni outlier test several times and delete the corresponding records from my data. My problem is :I cannot extract the id from outlierResult. Here is the reproducible Code. I want to…
Hamideh
  • 665
  • 2
  • 8
  • 20
2
votes
0 answers

Doing linear regression in Torch gives NaN as error

I am new for torch. Recently, I am trying to use torch to do multi-linear regression. But the error is always being infinity and nan. For the first two error, it is obviously increasing. Here is my code. dataset= 124.0000 81.6900 64.5000 …
2
votes
1 answer

How to minimize chi squared for 3 linear fits

from numpy import * import matplotlib.pyplot as plt import numpy as np # This is my data set x = [15, 30, 45, 60, 75, 90, 105, 120, 135, 150, 165, 180, 195, 210, 225, 240] y = [1, 0.9, 0.8, 0.7, 0.6, 0.55, 0.5, 0.45, 0.4, 0.35, 0.33, 0.31, 0.29,…
PiccolMan
  • 4,854
  • 12
  • 35
  • 53
2
votes
1 answer

Java Linear Regression

I need to find the best fitting regression line for a set of points. For example for this matrix: int b [][] = { { 3, 1, 0, 0, 0, 0, 0, 0, 0 }, { 1, 2, 3, 1, 0, 1, 0, 0, 0 }, { 0, 1, 2, 1, 0, 0, 0, 0, 0…
TheFooBarWay
  • 594
  • 1
  • 7
  • 17
2
votes
1 answer

R: Robust linear regression using a list having repeated number

I'm using an rlm model like this. fit=rlm(log(y) ~ x + z) Z is a list that contains all 1. I get the error Error in rlm.default(x, y, weights, method = method, wt.method = wt.method, : 'x' is singular: singular fits are not implemented in…
zinon
  • 4,427
  • 14
  • 70
  • 112