2

I am trying to get the marginal effects, according to this post: http://andrewgelman.com/2016/01/14/rstanarm-and-more/

td <- readRDS("some data")

CHAINS <- 1
CORES <- 1
SEED <- 42
ITERATIONS <- 2000
MAX_TREEDEPTH <- 9

md <- td[,.(y,x1,x2)] # selection the columns i need. y is binary


glm1 <- stan_glm(y~x1+x2,
                 data = md,
                 family = binomial(link="logit"),
                 prior = NULL,
                 prior_intercept = NULL,
                 chains = CHAINS,
                 cores = CORES,
                 seed = SEED,
                 iter = ITERATIONS,
                 control=list(max_treedepth=MAX_TREEDEPTH)
)

# launch_shinystan(glm1) 


tmp <- posterior_predict(glm1,newdata=md[,.(x1,x2)])

Issue

After running this code i get the following error: I get an error that y not found, which actually means that i also need to pass y in the newdata, which it shouldn't be the case according to ?posterior_predict

Reasoning

I need tmp <- posterior_predict(glm1,newdata=md[,.(x1,x2)]) because according to the post above (as far as i understand), in order to calculate the marginal effect of x1 (if i assume that x1 is binary) would be

temp <- md
temp[,x1:=0]
temp[,x2:=mean(x2)]
number_0 <- posterior_predict(glm1,newdata=temp)

temp <- md
temp[,x1:=1]
temp[,x2:=mean(x2)]
number_1 <- posterior_predict(glm1,newdata=temp)

marginal_effect_x1 <- number_1 - number_0
Ben Goodrich
  • 4,870
  • 1
  • 19
  • 18
quant
  • 4,062
  • 5
  • 29
  • 70
  • 1
    Although it is not relevant to your question, using only 1 chain is not a good idea. And you should not have to _reduce_ `max_treedepth` from its default value (of 15 in rstanarm vs. 10 in rstan); leaving it at a higher value does not hurt anything when it is not reached. – Ben Goodrich Jul 11 '17 at 18:49

1 Answers1

3

For a binary logit model, the marginal effect of a continuous variable is the derivative of the probability of success with respect to that variable, which by the chain rule is the logistic density (evaluated at some values of the predictors, usually the observed values of the predictors) multiplied by the coefficient of the variable in question. In your case, that would be df <- as.data.frame(glm1) ME <- df$x2 * dlogis(posterior_linpred(glm1)) Since this depends on the observed values of the predictors, it is common to average over the data with AME <- rowMeans(ME) In the case of a binary predictor, you can just subtract the probability of success when x1 = 0 from the probability of success when x1 = 1 via nd <- md nd$x1 <- 0 p0 <- posterior_linpred(glm1, newdata = nd, transform = TRUE) nd$x1 <- 1 p1 <- posterior_linpred(glm1, newdata = nd, transform = TRUE) ME <- p1 - p0 AME <- rowMeans(ME)

Ben Goodrich
  • 4,870
  • 1
  • 19
  • 18
  • When i run the code for the binary, i end up with a vector instead of a scalar. But marginal effect is just a scalar, right ? – quant Jul 12 '17 at 09:37
  • Moreover, why `to_predict <- copy(md); to_predict[,x1:=0]; to_predict[,x2:=mean(x2)]; value0 <- unique(predict(glm1,newdata=to_predict)); to_predict[,x1:=1]; value1 <- unique(predict(glm1,newdata=to_predict)); ME <- value1 - value0` yields very different results than your answer ? I would expect `?predict` and `?posterior_linpred` to produce similar results. – quant Jul 12 '17 at 11:29
  • 1
    `AME` is a vector because it is a posterior distribution. Conceptually, the average marginal effect is a scalar, but you do not know what scalar it is, so you have a distribution of beliefs about what it is. This is the same as any other quantity in the Bayesian estimation framework. – Ben Goodrich Jul 12 '17 at 12:20
  • 1
    Also, the `predict` function is intended to be used after optimization, not MCMC or ADVI. The `predict` function uses the marginal medians of the coefficients to predict with, which ignores the correlation between the coefficient estimates and the uncertainty around them. So, it can be quite different than `posterior_linpred` and `posterior_predict`. – Ben Goodrich Jul 12 '17 at 12:25
  • Thank you for your argumentative answer. But why do you use between `posterior_linpred` instead of `posterior_predict` ? Because these also yield a bit different results. – quant Jul 13 '17 at 07:44
  • 1
    In the logit case, `posterior_predict` yields 0s and 1s. For a huge number of simulations, the average of the 0s and 1s will come out the same as the average for `posterior_linpred`, which yields a probability when `transform = TRUE`. But for a finite number of draws, it is a bit less noisy to use `posterior_linpred` – Ben Goodrich Jul 13 '17 at 18:33
  • So for the gaussian (normal regression) case, it would not matter which one to use ? – quant Jul 14 '17 at 07:42
  • 1
    You would still get slightly different numbers. Conceptually, differentiating the linear predictor is what you are looking for with these marginal effects. – Ben Goodrich Jul 15 '17 at 13:08