Questions tagged [linear-regression]

for issues related to linear regression modelling approach

Linear Regression is a formalization of relationships between variables in the form of mathematical equations. It describes how one or more random variables are related to one or more other variables. Here the variables are not deterministically but stochastically related.

Example

Height and age are probabilistically distributed over humans. They are stochastically related; when you know that a person is of age 30, this influences the chance of this person being 4 feet tall. When you know that a person is of age 13, this influences the chance of this person being 6 feet tall.

Model 1

heighti = b0 + b1agei + εi, where b0 is the intercept, b1 is a parameter that age is multiplied by to get a prediction of height, ε is the error term, and i is the subject

Model 2

heighti = b0 + b1agei + b2sexi + εi, where the variable sex is dichotomous

In linear regression, user data X is modelled using linear functions Y, and unknown model parameters W are estimated or learned from the data. E.g., a linear regression model for a k-dimensional user data can be represented as :

Y = w1 x1 + w2 x2 + ... + wk xk

Reading Statistical Modeling: The Two Cultures http://projecteuclid.org/download/pdf_1/euclid.ss/1009213726

In scientific software for statistical computing and graphics, function lm (see ) implements linear regression.

6517 questions
2
votes
2 answers

How can I ignore the NA data when I do the lm function?

My question is rather simple, but I could not get it resolved after trying a lot of things. I have two data frames. >a col1 col2 col3 col4 1 1 2 1 4 2 2 NA 2 3 3 3 2 3 2 4 4 3 4 1 > b col1…
didimichael
  • 71
  • 1
  • 3
  • 7
2
votes
1 answer

Linear regression with only previous values in moving window

I have a huge dataset and would like to perform a rolling linear regression over a window of 60. However, I want that only the 60 previous values are considered for the linear regression. My Dataframe DF consists of following Columns: Date …
Henky
  • 69
  • 7
2
votes
1 answer

Precision_score and accuracy_score showing value error

I'm new to this machine learning and using this boston dataset for predictions. Everything except the result for precision_score and accuracy_score is working fine . This is what i have done : import pandas as pd import sklearn from…
harshi
  • 343
  • 2
  • 4
  • 10
2
votes
1 answer

Plotting both a GLM and LM of same data

I would like to plot both a linear model (LM) and non-linear (GLM) model of the same data. The range between 16% - 84% should line up between a LM and GLM, Citation: section 3.5 I have included a more complete chunk of the code because I am not…
Arch
  • 192
  • 2
  • 16
2
votes
0 answers

Linear regression accuracy 95%, but predicts past data

Having a pandas dataframe of 4 rows of features, I create labels for them from "forecast_col" and shift them back to the past to make prediction later: pandasdf['label'] = pandasdf[forecast_col].shift(-forecast_out) Taking all the rows except the…
2
votes
2 answers

Least Squares Fit on Cubic Bezier Curve

I would like fit a cubic bezier curve on a set of 500 random points. Here's the code I have for the bezier curve: import numpy as np from scipy.misc import comb def bernstein_poly(i, n, t): """ The Bernstein polynomial of n, i as a…
2
votes
2 answers

By two combinations of predictors in linear regression in R

Suppose that I have X1,...,X14 potential predictors. Now for a given Y i want to make the OLS scheme: Y~X1+X2 Y~X1+X3 .... Y~X1+X14 .... Y~X14+X13 which is basically all the by two combinations of all the predictors. After all those regressions…
Hercules Apergis
  • 423
  • 6
  • 20
2
votes
1 answer

How to handle missing data in machine learning?

I have a dataframe which always has missed information between 9pm of Fridays and 0am on Mondays. I'm using this data to make prediction trough linear regression algorithm, so this jump gumps up my predictions: date timestamp …
mllamazares
  • 7,876
  • 17
  • 61
  • 89
2
votes
1 answer

Gradient Descent For Mutivariate Linear Regression

Ok, so what does this algorithm exactly mean? What I know : i) alpha : how big the step for gradient descent will be. ii) Now , ∑{ hTheta[x(i)] - y(i) } : refers to Total Error with given values of Theta. The error refers to the difference…
2
votes
1 answer

How to obtain coefficient values from Spark-MLlib Linear Regression model (Scala)?

I'd like to obtain coefficient values of Linear Regression(LR) model in Spark-MLlib. Here I use the 'LinearRegressionWithSGD' to build the model and you can find the sample from the following…
Ramkumar
  • 444
  • 1
  • 7
  • 22
2
votes
1 answer

How to plot confidence bands for my weighted log-log linear regression?

I need to plot an exponential species-area relationship using the exponential form of a weighted log-log linear model, where mean species number per location/Bank (sb$NoSpec.mean) is weighted by the variance in species number per year…
2
votes
3 answers

Spark ML Linear Regression - What Hyper-parameters to Tune

I'm using the LinearRegression model in the Spark ML for predicting price. It is a single variate regression (x=time, y=price). Assume my data is clean, what are the usual steps to take to improve this model? So far, I tried tuning regularization…
gyoho
  • 799
  • 2
  • 9
  • 25
2
votes
0 answers

Is there a way to intersect real-valued column with a sparse column?

crossed_column is able to intersect a few sparse (categorical) columns. Is there a way to intersect a real-valued column with a sparse column in a LinearRegressor ? The mathematical meaning of this seems clear: I need different weights at continuous…
noname7619
  • 3,370
  • 3
  • 21
  • 26
2
votes
1 answer

Cost Function, what's the difference between sum(x) and ones(1,length(x)) *x?

I'm doing Professor Andrew Ng's Machine Learning course on Coursera. I'm trying to code the cost function. This was my first solution: J= (1/(2*m))* (ones(1,97) * (((X*theta)-y).^2 )); But it wasn't accepted, so I tried it with sum: J = 1 / (2 * m)…
GniruT
  • 731
  • 1
  • 6
  • 14
2
votes
1 answer

How do I interpret the TukeyHSD output in R? (in relation to the underlying regression model)

I built a simple linear regression model with 'Score' as the dependent variable, and 'Activity' as the independent one. 'Activity' has 5 levels: 'listen' (reference level), 'read1', 'read2', 'watch1', 'watch2'. Call: lm(formula = Score ~…
fannilegoza
  • 23
  • 1
  • 1
  • 4