Questions tagged [regression]

Regression analysis is a collection of statistical techniques for modeling and predicting one or multiple variables based on other data.

Wiki

Regression is a common applied statistical technique and a cornerstone of machine learning. Various algorithms and software packages can be used to fit and use regression models.

In other words, regression is a statistical measure that attempts to determine the strength of the relationship between one dependent variable (usually denoted by Y) and a series of other changing variables (known as independent variables). Typically the dependent variables are modeled with probability distributions whose parameters are assumed to vary (deterministically) with the independent variables.

Tag usage

Questions on should be about implementation and programming problems, not about the statistical or theoretical properties of the technique. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics and machine learning.

Read more:

9532 questions
2
votes
1 answer

Adding fixed effects regression line to ggplot

I am plotting panel data using ggplot and I want to add the regression line for my fixed effects model "fixed" to the plot. This is the current code: # Fixed Effects Model in plm fixed <- plm(progenyMean ~ damMean, data=finalDT, model= "within",…
codemachino
  • 103
  • 9
2
votes
4 answers

How can I improve the format of regression tables?

I've used R to run some logit regressions - testing the characteristics of investment firms and whether or not any predict sustainable behaviours. In my paper I've copied acorss the output from R, however, I've had feedback saying I should try to…
R Wd
  • 59
  • 5
2
votes
1 answer

Is there a way to display regression coefficients in a pandas data frame for categorical independent variables?

I have built a multiple linear regression model and I found the coefficients using model.coef_. I want to make a pandas data frame which displays each of the factors and its coefficient. pd.DataFrame(model.coef_, x.columns, columns =…
Sona
  • 35
  • 1
  • 5
2
votes
1 answer

MANOVA effect size (partial eta squared) in R

For ANOVA, one can easily get the partial eta squared (np2) effect size with effectsize::eta_squared: > model <- aov(mpg ~ factor(cyl), data = mtcars) > effectsize::eta_squared(model) For one-way between subjects designs, partial eta squared is…
rempsyc
  • 785
  • 5
  • 24
2
votes
2 answers

How to do negative binomial regression with the rms package in R?

How can I use the rms package in R to execute a negative binomial regression? (I originally posted this question on Statistics SE, but it was closed apparently because it is a better fit here.) With the MASS package, I use the glm.nb function, but I…
Tripartio
  • 1,955
  • 1
  • 24
  • 29
2
votes
1 answer

Clustered Standard Errors in SUR - sureg or gsem in Stata

I am trying to estimate vote shares of different parties. So, I have 3 parties, each having its own column in the data set. Hence, the sum of vote shares is 1, and hence the errors are correlated and I have to use Seemingly Unrelated Regressions…
Anisha Garg
  • 53
  • 5
  • 10
2
votes
0 answers

How to run Hausman Test for endogeneity on during regression in Python using StatsModels?

How can I run a Hausman test for endogeneity in Python using StatsModels?
2
votes
0 answers

Predict next integer in sequence using ML.NET

Given a lengthy sequence of integers in the range of 0-1 I would like to be able to predict the next likely integer. Example dataset: 1 1 1 0 0 0 0 1 1 0 0 1 0 1 1 0 0 0 1 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 1 0 0 1 0 1 1 0 1 0 1 0 1 0 1 0…
keithl8041
  • 2,383
  • 20
  • 27
2
votes
1 answer

How to assign different initial values inside geom_smooth() (from the ggplot2 package) for multiple nonlinear regressions?

I have 2 datasets, one for group A and one for group B. I want to visually fit the data into the y=1-exp(-kx) model. For this I'm using the ggplot2 package and the geom_smooth() function. Within the arguments of the geom_smooth() function it is…
Daniel Valencia C.
  • 2,159
  • 2
  • 19
  • 38
2
votes
1 answer

pytorch loss function for regression model with a vector of values

I'm training a CNN architecture to solve a regression problem using PyTorch where my output is a tensor of 25 values. The input/target tensor could be either all zeros or a gaussian distribution with a sigma value of 2. An example of a 4-sample…
Feng Shi
  • 21
  • 2
2
votes
1 answer

R lme4 model: calculating effect size between continuous predictor's max-min value

I'm struggling to calculate an effect size between a continuous predictor's max-min value while using an R lme4 multilevel model. Simulated data: predictor "x" ranges from 1 to 3 library(tidyverse) n = 100 a = tibble(y = rep(c("pos", "neg", "neg",…
st4co4
  • 445
  • 3
  • 10
2
votes
1 answer

Different loss values and accuracies of MLP regressor in keras and scikit-learn

I have a neural network with one hidden layer implemented in both Keras and scikit-learn for solving a regression problem. In scikit-learn I used the MLPregressor class with mostly default parameters and in Keras I have a hidden Dense layer with…
Ross
  • 265
  • 1
  • 3
  • 13
2
votes
2 answers

Pandas : Compute a new column based on linear regression of previous row

My dataframe looks like this: date Temperature consumption 0 2020-12-01 8.0125 109.046450 1 2020-12-02 6.1500 104.494946 2 2020-12-03 5.9375 117.011582 3 2020-12-04 5.4750 109.615388 4 2020-12-05 3.8500 …
epsilon
  • 33
  • 4
2
votes
0 answers

Can you reverse scaling and centering in the axes of a ggplot2 plot?

I want to scale the predictor variable of a regression model but I then want to plot the original values on the x-axis for intelligibility using ggplot2. I have attempted to do this using scale_x_continuous(). library('tidyverse') x <- rnorm(100,…
2
votes
1 answer

Exclude NA values only and not entire rows in a lm in R?

If I have a dataset that looks like the following, looking at species richness of spiders in different habitats of a garden. 'data.frame': 6 obs. of 5 variables: $ ID : int 1 2 3 4 5 6 $ species_count: num 10 13 15 17 22 9 $…
Tim Clover