0

I built a Bayesian regression model in R and I'm trying to use it to make predictions from a record X that's not in the training set. The problem is, no matter how I change the values of one of the independent variables in X, the prediction remains the same! Can somebody shed some light on how predictions are made using Bayesian regression models? Here's some code to give you some idea of what I'm doing:

install.packages("BMA")
library(BMA)

D_for_Bayes_tweaked <- D_for_Bayes

bfit1 <- bic.glm(f = as.formula('freq_iso_chng2 ~ cpi_gasoline_proj + vhcl_age_proj + vhcl_sale_ltruck_proj',), data = D_for_Bayes[D_for_Bayes$year >= 2003 & D_for_Bayes$year <= 2019,], glm.family = Gamma(link = "log"))

(p1 <- predict(bfit1, newdata = D_for_Bayes[D_for_Bayes$year >= 2003 & D_for_Bayes$year <= 2020,], type = "response"))

cpi_gasoline_proj_change <- 0.025

new_gas_cpi <- D_for_Bayes[D_for_Bayes$year == 2020,]$cpi_gasoline_proj + cpi_gasoline_proj_change

D_for_Bayes_tweaked[D_for_Bayes_tweaked$year == 2020,]$cpi_gasoline_proj <- new_gas_cpi

bfit2 <- bic.glm(f = as.formula('freq_iso_chng2 ~ cpi_gasoline_proj + vhcl_age_proj + vhcl_sale_ltruck_proj',), data = D_for_Bayes_tweaked[D_for_Bayes_tweaked$year >= 2003 & D_for_Bayes_tweaked$year <= 2019,], glm.family = Gamma(link = "log"))
(p2 <- predict(bfit2, newdata = D_for_Bayes_tweaked[D_for_Bayes_tweaked$year >= 2003 & D_for_Bayes_tweaked$year <= 2020,], type = "response"))
Chris J
  • 23
  • 3
  • Not familiar with `BMA` but are you fitting a second model with the new X's? If so, there might be your problem. Try using the first model to predict the tweaked data. – Adam B. Mar 21 '20 at 20:53
  • No, the only data that's getting tweaked is the test data. bfit1 and bfit2 are the same, but I'm feeding one set of predictors into bfit1 and an altered set of predictors into bfit2. In any case, I did try to predict the values in D_for_Bayes_tweaked using bfit1, and I got the same predictions when I used bfit1 to predict the values in D_for_Bayes. – Chris J Mar 21 '20 at 22:05
  • 2
    Hi! This question is more about theoretical statistics and may not be a stack.overflow theme. 2 comments: 1 if ypu're running bayesian regression I really recommend using package rstanarm. 2 bayesian prediction is done through the posterior predictive distribution which in bayesian normal normal congugacy is easy to obtain analitically – Gibran Peniche Mar 22 '20 at 00:13
  • It's not really that theoretical. Bottom line is, I need to know what this line of code is doing: (p1 <- predict(bfit1, newdata = D_for_Bayes[D_for_Bayes$year >= 2003 & D_for_Bayes$year <= 2020,], type = "response")) – Chris J Mar 22 '20 at 00:18
  • "I'm feeding one set of predictors into bfit1 and an altered set of predictors into bfit2" Both of the models will give you similar answers because you're fitting them to predict the same response. They might have different coefficient estimates, but the predictions could be pretty much the same. Models always try to predict the response as best as they can, given circumstances. – Adam B. Mar 22 '20 at 05:29
  • "In any case, I did try to predict the values in D_for_Bayes_tweaked using bfit1, and I got the same predictions when I used bfit1 to predict the values in D_for_Bayes." That could mean that the coefficient estimate for cpi_gasoline_proj is very small, close to zero. I.e., there might not be any useful information in cpi_gasoline_proj to predict the response. Have you checked the model summary/marginal posteriors? – Adam B. Mar 22 '20 at 05:31
  • Adam, the models were both fit to the same training data, so they have the same coefficients. What's changing is the test data. The coefficient of cpi_gasoline_proj isn't small enough to account for an exact agreement in the predicted values from the two different test records. – Chris J Mar 22 '20 at 10:13

1 Answers1

0

OK, I figured out how to solve my problem. Here's the underlying problem: I was trying to predict the value of freq_iso_chng2 for the year 2020, but originally, the 2020 value of freq_iso_chng2 was NA. This affects the prediction. If you replace the "NA" value of freq_iso_chng2 with any specific number, then the predictions DO respond to changes in predictor variables. Don't know why this works, but it does. For some reason, R's predictions using a Bayes model differ depending on whether the target started out missing, or started out as a number.

Chris J
  • 23
  • 3