Questions tagged [regression]

Regression analysis is a collection of statistical techniques for modeling and predicting one or multiple variables based on other data.

Wiki

Regression is a common applied statistical technique and a cornerstone of machine learning. Various algorithms and software packages can be used to fit and use regression models.

In other words, regression is a statistical measure that attempts to determine the strength of the relationship between one dependent variable (usually denoted by Y) and a series of other changing variables (known as independent variables). Typically the dependent variables are modeled with probability distributions whose parameters are assumed to vary (deterministically) with the independent variables.

Tag usage

Questions on should be about implementation and programming problems, not about the statistical or theoretical properties of the technique. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics and machine learning.

Read more:

9532 questions
14
votes
3 answers

R print equation of linear regression on the plot itself

How do we print the equation of a line on a plot? I have 2 independent variables and would like an equation like this: y=mx1+bx2+c where x1=cost, x2 =targeting I can plot the best fit line but how do i print the equation on the plot? Maybe i cant…
jxn
  • 7,685
  • 28
  • 90
  • 172
14
votes
4 answers

Find and draw regression plane to a set of points

I want to fit a plane to some data points and draw it. My current code is this: import numpy as np from mpl_toolkits.mplot3d import Axes3D import matplotlib.pyplot as plt points = [(1.1,2.1,8.1), (3.2,4.2,8.0), (5.3,1.3,8.2), …
Tobias Hermann
  • 9,936
  • 6
  • 61
  • 134
14
votes
1 answer

Regression and summary statistics by group within a data.table

I would like to calculate some summary statistics and perform different regressions by group within a data table, and have the results in "wide" format (i.e. one row per group with several columns). I can do it in multiple steps, but it seems like…
dnlbrky
  • 9,396
  • 2
  • 51
  • 64
14
votes
1 answer

Linear regression with pandas dataframe

I have a dataframe in pandas that I'm using to produce a scatterplot, and want to include a regression line for the plot. Right now I'm trying to do this with polyfit. Here's my code: import pandas as pd import matplotlib import matplotlib.pyplot as…
TimStuart
  • 434
  • 3
  • 6
  • 9
14
votes
3 answers

In R, how to add the fitted value column to the original dataframe?

I have a multiple regression model. I want to add the fitted values and residuals to the original data.frame as two new columns. How can I achieve that? My model in R is like this: BD_lm <- lm(y ~ x1+x2+x3+x4+x5+x6, data=BD) summary(BD) I also…
titi
  • 609
  • 2
  • 7
  • 9
14
votes
4 answers

Logistic regression with robust clustered standard errors in R

A newbie question: does anyone know how to run a logistic regression with clustered standard errors in R? In Stata it's just logit Y X1 X2 X3, vce(cluster Z), but unfortunately I haven't figured out how to do the same analysis in R. Thanks in…
danilofreire
  • 503
  • 1
  • 5
  • 18
14
votes
3 answers

Nonparametric quantile regression curves to scatterplot

I created a scatterplot (multiple groups GRP) with IV=time, DV=concentration. I wanted to add the quantile regression curves (0.025,0.05,0.5,0.95,0.975) to my plot. And by the way, this is what I did to create the scatter-plot: attach(E) ## E is…
shirleywu
  • 674
  • 10
  • 23
13
votes
4 answers

Double clustered standard errors for panel data

I have a panel data set in R (time and cross section) and would like to compute standard errors that are clustered by two dimensions, because my residuals are correlated both ways. Googling around I found…
Alex
  • 19,533
  • 37
  • 126
  • 195
13
votes
2 answers

Calculation of R^2 value for a non-linear regression

I would first like to say, that I understand that calculating an R^2 value for a non-linear regression isn't exactly correct or a valid thing to do. However, I'm in a transition period of performing most of our work in SigmaPlot over to R and for…
sinclairjesse
  • 1,585
  • 4
  • 17
  • 29
13
votes
2 answers

What is the negative mean absolute error in scikit-learn?

I am trying to train a model using SciKit Learn's SVM module. For the scoring, I could not find the mean_absolute_error(MAE), however, negative_mean_absolute_error(NMAE) does exist. What is the difference between these 2 metrics? Lets say I get the…
darkhorse
  • 8,192
  • 21
  • 72
  • 148
13
votes
2 answers

Shape not aligned error in OLS Regression python

I have a dataframe where I am trying to run the statsmodel.api OLS regression. It is printing out the summary. But when I am using the predict() function, it is giving me an error - shapes (75,7) and (6,) not aligned: 7 (dim 1) != 6 (dim 0) My…
Trisa Biswas
  • 555
  • 1
  • 3
  • 17
13
votes
1 answer

Fast pairwise simple linear regression between variables in a data frame

I have seen pairwise or general paired simple linear regression many times on Stack Overflow. Here is a toy dataset for this kind of problem. set.seed(0) X <- matrix(runif(100), 100, 5, dimnames = list(1:100, LETTERS[1:5])) b <- c(1, 0.7, 1.3, 2.9,…
Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
13
votes
1 answer

Transformed beta regression in python

Has anyone tried implementing beta transformed regression python? It is used to model values that lie between 0 and 1 and has a distribution with heteroskedasticity ineherently present. Essentially you first transform the dependent variable to a…
user8947896
  • 131
  • 1
  • 5
13
votes
3 answers

difference between LinearRegression and svm.SVR(kernel="linear")

First there are questions on this forum very similar to this one but trust me none matches so no duplicating please. I have encountered two methods of linear regression using scikit's sklearn and I am failing to understand the difference between the…
13
votes
2 answers

Difference between Linear Regression Coefficients between Python and R

I'm trying to run a linear regression in Python that I have already done in R in order to find variables with 0 coefficients. The issue I'm running into is that the linear regression in R returns NAs for columns with low variance while the scikit…
Nizag
  • 909
  • 1
  • 9
  • 15