Questions tagged [regression]

Regression analysis is a collection of statistical techniques for modeling and predicting one or multiple variables based on other data.

Wiki

Regression is a common applied statistical technique and a cornerstone of machine learning. Various algorithms and software packages can be used to fit and use regression models.

In other words, regression is a statistical measure that attempts to determine the strength of the relationship between one dependent variable (usually denoted by Y) and a series of other changing variables (known as independent variables). Typically the dependent variables are modeled with probability distributions whose parameters are assumed to vary (deterministically) with the independent variables.

Tag usage

Questions on should be about implementation and programming problems, not about the statistical or theoretical properties of the technique. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics and machine learning.

Read more:

9532 questions
11
votes
5 answers

Python Pandas: how to turn a DataFrame with "factors" into a design matrix for linear regression?

If memory servies me, in R there is a data type called factor which when used within a DataFrame can be automatically unpacked into the necessary columns of a regression design matrix. For example, a factor containing True/False/Maybe values would…
Setjmp
  • 27,279
  • 27
  • 74
  • 92
10
votes
2 answers

Applying a rolling window regression to an XTS series in R

I have an xts of 1033 daily returns points for 5 currency pairs on which I want to run a rolling window regression, but rollapply is not working for my defined function which uses lm(). Here is my data: > head(fxr) USDZAR …
Thomas Browne
  • 23,824
  • 32
  • 78
  • 121
10
votes
3 answers

R bootstrap regression with facet_wrap

Been practicing with the mtcars dataset. I created this graph with a linear model. library(tidyverse) library(tidymodels) ggplot(data = mtcars, aes(x = wt, y = mpg)) + geom_point() + geom_smooth(method = 'lm') Then I converted the dataframe to…
hachiko
  • 671
  • 7
  • 20
10
votes
2 answers

Mutate_all except some columns

I have a dataframe containing a set of variables that I want to lag at different lenghts so that I can use them in regressions later on (instead of lagging one variable at a time manually). I found this code on Stackoverflow that seems to do the…
Andycode
  • 171
  • 1
  • 1
  • 10
10
votes
1 answer

How to specify the prior for scikit-learn's Gaussian process regression?

As mentioned here, scikit-learn's Gaussian process regression (GPR) permits "prediction without prior fitting (based on the GP prior)". But I have an idea for what my prior should be (i.e. it should not simply have a mean of zero but perhaps my…
Mathews24
  • 681
  • 10
  • 30
10
votes
2 answers

L1 norm instead of L2 norm for cost function in regression model

I was wondering if there's a function in Python that would do the same job as scipy.linalg.lstsq but uses “least absolute deviations” regression instead of “least squares” regression (OLS). I want to use the L1 norm, instead of the L2 norm. In fact,…
Sara .Eft
  • 101
  • 1
  • 5
10
votes
1 answer

perform Deming regression without intercept

I would like to perform Deming regression (or any equivalent of a regression method with uncertainties in both X and Y variables, such as York regression). In my application, I have a very good scientific justification to deliberately set the…
agenis
  • 8,069
  • 5
  • 53
  • 102
10
votes
2 answers

What is the difference between RSE and MSE?

I am going through Introduction to Statistical Learning in R by Hastie and Tibshirani. I came across two concepts: RSE and MSE. My understanding is like this: RSE = sqrt(RSS/N-2) MSE = RSS/N Now I am building 3 models for a problem and need to…
Scott Grammilo
  • 1,229
  • 4
  • 16
  • 37
10
votes
2 answers

Using and interpreting output from gvlma

I want to test whether all assumptions for my linear regression model hold. I did this manually and it seems to be fine. However, I want to double check with the function gvlma. The output I get is: gvlma(x = m_lag) Value p-value…
PCUnique
  • 127
  • 1
  • 1
  • 8
10
votes
6 answers

How to Calculate R^2 in Tensorflow

I am trying to do regression in Tensorflow. I'm not positive I am calculating R^2 correctly as Tensorflow gives me a different answer than sklearn.metrics.r2_score Can someone please look at my below code and let me know if I implemented the…
Matt Camp
  • 1,448
  • 3
  • 17
  • 38
10
votes
3 answers

Python Keras cross_val_score Error

I am trying to do this little tutorial on keras about regression: http://machinelearningmastery.com/regression-tutorial-keras-deep-learning-library-python/ Unfortunately I am running into an error I cannot fix. If i just copy and paste the code I…
user7454972
  • 218
  • 3
  • 13
10
votes
2 answers

How to correctly `dput` a fitted linear model (by `lm`) to an ASCII file and recreate it later?

I want to persist a lm object to a file and reload it into another program. I know I can do this by writing/reading a binary file via saveRDS/readRDS, but I'd like to have an ASCII file instead of a binary file. At a more general level, I'd like…
mpettis
  • 3,222
  • 4
  • 28
  • 35
10
votes
1 answer

R - Calculate Test MSE given a trained model from a training set and a test set

Given two simple sets of data: head(training_set) x y 1 1 2.167512 2 2 4.684017 3 3 3.702477 4 4 9.417312 5 5 9.424831 6 6 13.090983 head(test_set) x y 1 1 2.068663 2 2 4.162103 …
Jebathon
  • 4,310
  • 14
  • 57
  • 108
10
votes
1 answer

linear model with `lm`: how to get prediction variance of sum of predicted values

I'm summing the predicted values from a linear model with multiple predictors, as in the example below, and want to calculate the combined variance, standard error and possibly confidence intervals for this sum. lm.tree <- lm(Volume ~ poly(Girth,2),…
CCID
  • 1,368
  • 5
  • 19
  • 35
10
votes
2 answers

plotly regression line R

Problem with adding a regression line to a 'plotly' scatter plot. I've done the following code: require(plotly) data(airquality) ## Scatter plot ## c <- plot_ly(data = airquality, x = Wind, y = Ozone, type = "scatter", mode =…
Monteiro
  • 101
  • 1
  • 1
  • 3