Questions tagged [linear-regression]

for issues related to linear regression modelling approach

Linear Regression is a formalization of relationships between variables in the form of mathematical equations. It describes how one or more random variables are related to one or more other variables. Here the variables are not deterministically but stochastically related.

Example

Height and age are probabilistically distributed over humans. They are stochastically related; when you know that a person is of age 30, this influences the chance of this person being 4 feet tall. When you know that a person is of age 13, this influences the chance of this person being 6 feet tall.

Model 1

heighti = b0 + b1agei + εi, where b0 is the intercept, b1 is a parameter that age is multiplied by to get a prediction of height, ε is the error term, and i is the subject

Model 2

heighti = b0 + b1agei + b2sexi + εi, where the variable sex is dichotomous

In linear regression, user data X is modelled using linear functions Y, and unknown model parameters W are estimated or learned from the data. E.g., a linear regression model for a k-dimensional user data can be represented as :

Y = w1 x1 + w2 x2 + ... + wk xk

Reading Statistical Modeling: The Two Cultures http://projecteuclid.org/download/pdf_1/euclid.ss/1009213726

In scientific software r for statistical computing and graphics, function lm (see lm) implements linear regression.

6517 questions

votes

6 answers

AnalysisException: u"cannot resolve 'name' given input columns: [ list] in sqlContext in spark

I tried a simple example like: data = sqlContext.read.format("csv").option("header", "true").option("inferSchema", "true").load("/databricks-datasets/samples/population-vs-price/data_geo.csv") data.cache() # Cache data for faster reuse data =…

python apache-spark linear-regression

asked Aug 18 '16 at 10:57

Elm662

votes

1 answer

Pandas DataFrame - 'cannot astype a datetimelike from [datetime64[ns]] to [float64]' when using ols/linear regression

I have a DataFrame as follows: Ticker Date Close 0 ADBE 2016-02-16 78.88 1 ADBE 2016-02-17 81.85 2 ADBE 2016-02-18 80.53 3 ADBE 2016-02-19 80.87 4 ADBE 2016-02-22 83.80 5 ADBE 2016-02-23 83.07 ...and so on. The Date column is…

python pandas dataframe time-series linear-regression

asked Nov 06 '16 at 19:44

Cole Starbuck

votes

2 answers

How can I plot my R Squared value on my scatterplot using R?

This seems a simple question, so I hope its a simple answer. I am plotting my points and fitting a linear model, which I can do OK. I then want to plot some summary statistics, for example the R Squared value, on the plot also. I can only seem to…

r plot statistics linear-regression

asked Sep 21 '10 at 14:37

phrozenpenguin

votes

5 answers

How to add a line of best fit to scatter plot

I'm currently working with Pandas and matplotlib to perform some data visualization and I want to add a line of best fit to my scatter plot. Here is my code: import matplotlib import matplotlib.pyplot as plt import pandas as panda import numpy as…

python pandas numpy matplotlib linear-regression

asked May 15 '16 at 03:12

JavascriptLoser

1,853
5
34
61

votes

3 answers

Specifying which category to treat as the base with 'statsmodels'

In understand that when I have a category variable in a model passed to a statsmodels fit that dummy variables will automatically be generated for the categories. For example if I have a variable 'Location' with values 'IndianOcean', 'Thailand',…

python linear-regression statsmodels categorical-data

asked Mar 16 '14 at 00:28

orome

45,163
57
202
418

votes

2 answers

Is there a Java library for better linear regression? (E.g., iteratively reweighted least squares)

I am struggling to find a way to perform better linear regression. I have been using the Moore-Penrose pseudoinverse and QR decomposition with JAMA library, but the results are not satisfactory. Would ojAlgo be useful? I have been hitting…

java math matrix matrix-multiplication linear-regression

asked Dec 06 '11 at 20:22

user1062571

votes

2 answers

How to compute AIC for linear regression model in Python?

I want to compute AIC for linear models to compare their complexity. I did it as follows: regr = linear_model.LinearRegression() regr.fit(X, y) aic_intercept_slope = aic(y, regr.coef_[0] * X.as_matrix() + regr.intercept_, k=1) def aic(y, y_pred,…

python linear-regression

asked Jul 11 '17 at 11:58

YNR

votes

3 answers

How to check for correlation among continuous and categorical variables?

I have a dataset including categorical variables(binary) and continuous variables. I'm trying to apply a linear regression model for predicting a continuous variable. Can someone please let me know how to check for correlation among the categorical…

python linear-regression correlation categorical-data

asked Jun 22 '17 at 08:33

funnyguy

votes

6 answers

AttributeError: module 'statsmodels.formula.api' has no attribute 'OLS'

I am trying to use Ordinary Least Squares for multivariable regression. But it says that there is no attribute 'OLS' from statsmodels. formula. api library. I am following the code from a lecture on Udemy The code is as follows: import…

python machine-learning linear-regression statsmodels

asked Jun 04 '19 at 18:57

Shubham Trehan

votes

1 answer

What is the most accurate method in python for computing the minimum norm solution or the solution obtained from the pseudo-inverse?

My goal is to solve: Kc=y with the pseudo-inverse (i.e. minimum norm solution): c=K^{+}y such that the model is (hopefully) high degree polynomial model f(x) = sum_i c_i x^i. I am specially interested in the underdetermined case where we have more…

python numpy precision linear-algebra linear-regression

asked Oct 22 '17 at 21:40

Charlie Parker

5,884
57
198
323

votes

1 answer

plot.lm(): extracting numbers labelled in the diagnostic Q-Q plot

For the simple example below, you can see that there are certain points that are identified in the ensuing plots. How can I extract the row numbers identified in these plots, especially the Normal Q-Q plot? set.seed(2016) maya <-…

r plot regression linear-regression lm

asked Jun 18 '16 at 13:04

Reuben Mathew

votes

4 answers

What is the BigO of linear regression?

How large a system is it reasonable to attempt to do a linear regression on? Specifically: I have a system with ~300K sample points and ~1200 linear terms. Is this computationally feasible?

big-o linear-regression blas gsl

asked Dec 23 '09 at 20:22

BCS

75,627
68
187
294

votes

2 answers

Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample

While I am predicting the one sample from my data, it gives reshape error but my model has equal number of rows. Here is my code: import pandas as pd from sklearn.linear_model import LinearRegression import numpy as np x = np.array([2.0 , 2.4, 1.5,…

python python-3.x machine-learning scikit-learn linear-regression

asked Nov 01 '19 at 17:56

user11585758

votes

2 answers

Linear regression with dummy/categorical variables

I have a set of data. I have use pandas to convert them in a dummy and categorical variables respectively. So, now I want to know, how to run a multiple linear regression (I am using statsmodels) in Python?. Are there some considerations or maybe I…

python pandas linear-regression statsmodels dummy-variable

asked Jun 07 '18 at 04:34

Héctor Alonso

votes

1 answer

Using a smoother with the L Method to determine the number of K-Means clusters

Has anyone tried to apply a smoother to the evaluation metric before applying the L-method to determine the number of k-means clusters in a dataset? If so, did it improve the results? Or allow a lower number of k-means trials and hence much greater…

algorithm cluster-analysis k-means linear-regression

asked Oct 27 '10 at 13:35

winwaed

7,645
6
36
81

Prev 1 2 3

…

99 100 Next