Questions tagged [regression]

Regression analysis is a collection of statistical techniques for modeling and predicting one or multiple variables based on other data.

Wiki

Regression is a common applied statistical technique and a cornerstone of machine learning. Various algorithms and software packages can be used to fit and use regression models.

In other words, regression is a statistical measure that attempts to determine the strength of the relationship between one dependent variable (usually denoted by Y) and a series of other changing variables (known as independent variables). Typically the dependent variables are modeled with probability distributions whose parameters are assumed to vary (deterministically) with the independent variables.

Tag usage

Questions on should be about implementation and programming problems, not about the statistical or theoretical properties of the technique. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics and machine learning.

Read more:

9532 questions
12
votes
1 answer

Test for Multicollinearity in Panel Data R

I am running a panel data regression using the plm package in R and want to control for multicollinearity between the explanatory variables. I know there is the vif() function in the car-package, however as far as I know, it cannot deal with panel…
David
  • 9,216
  • 4
  • 45
  • 78
11
votes
1 answer

Compute projection / hat matrix via QR factorization, SVD (and Cholesky factorization?)

I'm trying to calculate in R a projection matrix P of an arbitrary N x J matrix S: P = S (S'S) ^ -1 S' I've been trying to perform this with the following function: P <- function(S){ output <- S %*% solve(t(S) %*% S) %*% t(S) …
bikeclub
  • 369
  • 2
  • 10
11
votes
3 answers

How do I deal with NAs in residuals in a regression in R?

So I am having some issues with some NA values in the residuals of a lm cross sectional regression in R. The issue isn't the NA values themselves, it's the way R presents them. For example: test$residuals # 1 2 4 …
c00kiemonster
  • 22,241
  • 34
  • 95
  • 133
11
votes
1 answer

Implementing Longitudinal Random Forest with LongituRF package in R

I have some high dimensional repeated measures data, and i am interested in fitting random forest model to investigate the suitability and predictive utility of such models. Specifically i am trying to implement the methods in the LongituRF package.…
11
votes
7 answers

Fitting several regression models by changing only one independent variable within mutate()

I suspect that this question might be a duplicate, however, I found nothing satisfactory. Imagine a simple dataset with a structure like this: set.seed(123) df <- data.frame(cov_a = rbinom(100, 1, prob = 0.5), cov_b = rbinom(100, 1,…
tmfmnk
  • 38,881
  • 4
  • 47
  • 67
11
votes
2 answers

ValueError: Unable to coerce to Series, length must be 1: given n

I have been trying to use RF regression from scikit-learn, but I’m getting an error with my standard (from docs and tutorials) model. Here is the code: import pandas as pd import numpy as np from sklearn.ensemble import RandomForestRegressor db =…
11
votes
1 answer

XGBoost error - Unknown objective function reg:squarederror

I am training a xgboost model for regression task and I passed the following parameters - params = {'eta':0.4, 'max_depth':5, 'colsample_bytree':0.6, 'objective':'reg:squarederror'} num_round = 10 xgb_model = xgboost.train(params, dtrain_x,…
Ankit Seth
  • 729
  • 1
  • 9
  • 23
11
votes
3 answers

How to interprete the regression plot obtained at the end of neural network regression for multiple outputs?

I have trained my Neural network model using MATLAB NN Toolbox. My network has multiple inputs and multiple outputs, 6 and 7 respectively, to be precise. I would like to clarify few questions based on it:- The final regression plot showed at the…
Manish
  • 458
  • 6
  • 19
11
votes
3 answers

Use broom and tidyverse to run regressions on different dependent variables

I'm looking for a Tidyverse / broom solution that can solve this puzzle: Let's say I have different DVs and a specific set of IVS and I want to perform a regression that considers every DV and this specific set of IVs. I know I can use something…
Luis
  • 1,388
  • 10
  • 30
11
votes
3 answers

Random Forest Regression - How do I analyse its performance? - python, sklearn

I'm struggling to assess the performance of my random forest - I've looked at the mean relative error, but I'm not sure if it's a good indicator. What are some things to check for? Also, how should I optimise my hyperparameters? I've used …
Julia
  • 981
  • 1
  • 8
  • 16
11
votes
1 answer

Multi-output regression

I have been looking in to Multi-output regression the last view weeks. I am working with the scikit learn package. My machine learning problem has an a input of 3 features an needs to predict two output variables. Some ML models in the sklearn…
Matthijs Visser
  • 111
  • 1
  • 1
  • 5
11
votes
1 answer

Avoiding Dummy variable trap and neural network

I know that categorical data should be one-hot encoded before training the machine learning algorithm. I also need that for multivariate linear regression I need to exclude one of the encoded variable to avoid so called dummy variable trap. Ex: If I…
user3489820
  • 1,459
  • 3
  • 22
  • 38
11
votes
2 answers

Python : How to interpret the result of logistic regression by sm.Logit

When I run a logistic regression using sm.Logit (from the statsmodel library), part of the result looks like this: Pseudo R-squ.: 0.4335 Log-Likelihood: -291.08 LL-Null: -513.87 LLR p-value: …
R.Yan
  • 111
  • 1
  • 1
  • 5
11
votes
2 answers

Plotting a 95% confidence interval for a lm object

How can I calculate and plot a confidence interval for my regression in r? So far I have two numerical vectors of equal length (x,y) and a regression object(lm.out). I have made a scatterplot of y given x and added the regression line to this plot.…
Max Lester
  • 133
  • 1
  • 1
  • 8
11
votes
5 answers

Extracting t-stat p values from lm in R

I have run a regression model in R using the lm function. The resulting ANOVA table gives me the F-value for each coefficient (which doesnt really make sense to me). What I would like to know is the t-stat for each coefficient and its corresponding…
zsad512
  • 861
  • 3
  • 15
  • 41