Questions tagged [regression]

Regression analysis is a collection of statistical techniques for modeling and predicting one or multiple variables based on other data.

Wiki

Regression is a common applied statistical technique and a cornerstone of machine learning. Various algorithms and software packages can be used to fit and use regression models.

In other words, regression is a statistical measure that attempts to determine the strength of the relationship between one dependent variable (usually denoted by Y) and a series of other changing variables (known as independent variables). Typically the dependent variables are modeled with probability distributions whose parameters are assumed to vary (deterministically) with the independent variables.

Tag usage

Questions on should be about implementation and programming problems, not about the statistical or theoretical properties of the technique. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics and machine learning.

Read more:

9532 questions
8
votes
2 answers

regression by group and retain all the columns in R

I am doing a linear regression by group and want to extract the residuals of the regression library(dplyr) set.seed(124) dat <- data.frame(ID = sample(111:503, 18576, replace = T), ID2 = sample(11:50, 18576, replace = T), …
89_Simple
  • 3,393
  • 3
  • 39
  • 94
8
votes
2 answers

R: Force regression coefficients to add up to 1

I'm trying to run a simple OLS regression with a restriction that the sum of the coefficients of two variables add up to 1. I want: Y = α + β1 * x1 + β2 * x2 + β3 * x3, where β1 + β2 = 1 I have found how to make a relation between coefficients…
Daniel
  • 639
  • 8
  • 24
8
votes
1 answer

Fitting a quadratic function in python without numpy polyfit

I am trying to fit a quadratic function to some data, and I'm trying to do this without using numpy's polyfit function. Mathematically I tried to follow this website https://neutrium.net/mathematics/least-squares-fitting-of-a-polynomial/ but somehow…
Ahmad Moussa
  • 876
  • 10
  • 31
8
votes
1 answer

ConvergenceWarning: Maximum Likelihood optimization failed to converge

I am trying to use the ARIMA algorithm in statsmodels library to do forecasting on a time series dataset. It is a stock price dataset and when I feed normalized data to the model it gives the below error. Note: This is a uni-variate forecasting and…
Suleka_28
  • 2,761
  • 4
  • 27
  • 43
8
votes
3 answers

Test accuracy is greater than train accuracy what to do?

I am using the random forest.My test accuracy is 70% on the other hand train accuracy is 34% ? what to do ? How can I solve this problem.
8
votes
4 answers

Search for corresponding node in a regression tree using rpart

I'm pretty new to R and I'm stuck with a pretty dumb problem. I'm calibrating a regression tree using the rpart package in order to do some classification and some forecasting. Thanks to R the calibration part is easy to do and easy to control. #the…
antoine
  • 123
  • 1
  • 5
8
votes
6 answers

Extract interaction terms from regression estimates

This is a simple question but I couldn't find a clear and compelling answer anywhere. If I have a regression model with one or more interaction terms, like: mod1 <- lm(mpg ~ factor(cyl) * factor(am), data = mtcars) coef(summary(mod1)) ## …
Thomas
  • 43,637
  • 12
  • 109
  • 140
8
votes
1 answer

How to set a custom loss function in Spark MLlib

I would like to use my own loss function instead of the squared loss for the linear regression model in spark MLlib. So far can't find any part in the documentation that mentions if it is even possible.
user4658980
8
votes
1 answer

Can I use dynlm without any lagged variables?

I am trying to use a dynamic linear regression using dynlm command in R programming since I need to analyze my panel data but I do not want to use panel regression. However, my model specification do not contain any lagged variables at all. Can I…
Eric
  • 528
  • 1
  • 8
  • 26
8
votes
1 answer

Multi-output regression model always returns the same value for a batch in Tensorflow

I have a multi-layer perceptron for a multi-output regression problem which predicts 14 continuous values. The following is the code snippet for the same: # Parameters learning_rate = 0.001 training_epochs = 1000 batch_size = 500 # Network…
Vasanti
  • 1,207
  • 2
  • 12
  • 24
8
votes
1 answer

Multivariate regression splines in R

Most people are probably familiar with bs from splines: library(splines) workingModel <- lm(mpg ~ factor(gear) + bs(wt, knots = 5) + hp, data = mtcars) bs(mtcars$wt, knots = 4) This uses a b-spline for the singe variable weight, but you can also…
Carl
  • 5,569
  • 6
  • 39
  • 74
8
votes
1 answer

`lm` summary not display all factor levels

I am running a linear regression on a number of attributes including two categorical attributes, B and F, and I don't get a coefficient value for every factor level I have. B has 9 levels and F has 6 levels. When I initially ran the model (with…
Karen Roberts
  • 83
  • 1
  • 1
  • 4
8
votes
2 answers

ANN regression, linear function approximation

I have built a regular ANN–BP setup with one unit on input and output layer and 4 nodes in hidden with sigmoid. Giving it a simple task to approximate linear f(n) = n with n in range 0-100. PROBLEM: Regardless of number of layers, units in hidden…
8
votes
1 answer

Multiple regression analysis in R using QR decomposition

I am trying to write a function for solving multiple regression using QR decomposition. Input: y vector and X matrix; output: b, e, R^2. So far I`ve got this and am terribly stuck; I think I have made everything way too complicated: QR.regression <-…
AGMG
  • 83
  • 6
8
votes
2 answers

How does plot.lm() determine outliers for residual vs fitted plot?

How does plot.lm() determine what points are outliers (that is, what points to label) for residual vs fitted plot? The only thing I found in the documentation is this: Details sub.caption—by default the function call—is shown as a subtitle (under…
3x89g2
  • 257
  • 4
  • 10