5

I'm doing classwork using Bayesian inference. For this, I'm using the MCMCregress function, from MCMCpack. The problem comes when I want to get the residuals, because the function doesn't provide them, so I have to calculate them "by hand" (in R).

My model is:

Model <- MCMCregress(Y~X1+X2+X3+X4+X5, data=DATA)

where X1 and X5 are continuous, while X2, X3 and X4 are dichotomous. The model output provides me the estimate for each variable:

(Intercept) = 1.90, 
X1 = -0.02, 
X2 = -0.05, 
X3 = 0.32,
X4 = 0.61,
X5 = -0.003,
sigma2 = 0.54

I think I have to so something like:

1.90 - 0.02*X1 - 0.05*X2 + 0.32*X3 ...

But I know that I'm missing something important in the R code, so I would like to know which is the correct R code in order to get residuals.


Here is a reproducible example (although it does not correspond to the original data):

Y  <- c(0.2,0.8,1,4.3,5,3.5,3.2,1,3.3,1,2,4,3.6,3,5,4.3,3.2,4,1,2)
X1 <- c(17,13,15,NA,12,24,15,NA,12,22,14,12,18,NA,10,13,12,11,26,10)
X2 <- c(0,0,0,1,0,0,1,1,1,1,0,1,0,0,0,0,0,1,NA,NA)
X3 <- c(0,0,1,1,1,0,1,0,0,0,0,1,0,NA,0,NA,NA,0,1,0)
X4 <- c(1,0,1,0,0,1,0,0,NA,0,NA,NA,0,0,1,0,1,1,0,1)
X5 <- c(2.46,4.56,32.1,NA,NA,NA,NA,NA,NA,3.76,5.67,4.56,NA,
        17.32,12.2,4.56,7.2,1.2,NA,9.2)

X2 <- ifelse(X2 > 0, c("Yes"), c("No")) 
X3 <- ifelse(X3 > 0, c("Yes"), c("No")) 
X4 <- ifelse(X4 > 0, c("Yes"), c("No")) 

DATA <- data.frame(Y,X1,X2,X3,X4,X5)


library(MCMCpack)
MCMC <- MCMCregress(Y~X1+X2+X3+X4+X5, data=DATA)
summary(MCMC)

How can I get the residuals?

  • Hello! Yes, I'm asking for R code, so I've added a reproducible example (I hope it is useful) so, when you want, you can migrate my question to the right place. Sorry for the inconvenience. –  Jul 27 '14 at 16:34
  • That's no problem, @Alien. We'll get this over there where someone can help you. – gung - Reinstate Monica Jul 27 '14 at 16:37

1 Answers1

2

I found that something terrible happens to the MCMC chain toward the end of the runs -- check out xyplot(MCMC,layout=c(2,4)). I don't know what the problem is, you'll have to sort that out, but in the meantime I'm going to use the median rather than the mean as the coefficient estimates.

coefs <- apply(MCMC,2,median)[1:6]
X <- model.matrix(~X1+X2+X3+X4+X5, data=DATA)
pred0 <- X %*% coefs

The most complicated part is dealing with NA values: this is built into most of the existing residuals() methods ...

pred <- napredict(attr(na.exclude(DATA),"na.action"),pred0)
resids <- pred-DATA$Y

Note that in this case, once NA values are removed, you're fitting a 5-parameter regression model to 6 complete observations. You'll probably have to do something sensible with imputation to get any kind of reasonable answer ...

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • Your help was very useful (just what I was looking for), but the NA number was too high, so many residuals couldn't be calculated (so a lot of information was missed). Finally, I opted for data imputation. –  Aug 18 '14 at 21:51