1

I am running Bayesian Hierarchical Modeling in R using R2jags. When I open code I used a month ago and run it on a dataset I used a month ago (verified by "date modified" in windows explorer), I get different results than I got a month ago. The only difference I can think of is I got a new work computer in the last month and we installed JAGS 4.3.0. I was previously using 4.2.0.

Is it remotely possible to get different results just from updating my version of JAGS? I'm not posting code or results here because I don't need help troubleshooting it - everything is exactly the same.

Edit:

Conversion seems fine - Gewekes, autocorrelation plots, and trace plots look good. That hasn't changed.

I have a seed set both via set.seet () and jags.seed=. Is that enough? I've never had a problem replicating these types of results before.

As far as how different the results are, they are large enough to cause meaningful difference in the inference. I am assessing relationships between 30 chemical exposures and a health outcome in among 336 humans. Here are two examples. Chemical B troubles me the most because of the credible interval shift. Chemical A is another example. Results change JAGS 4.2.0 vs JAGS 4.3.0

I also doubled the number of iterations from 50k to 100k which resulted in very minor/inconsequential differences.

Edit 2:

I posted at source forge asking about the different default RNGs for versions: https://sourceforge.net/p/mcmc-jags/discussion/610037/thread/52bfef7d17/

pakalla
  • 163
  • 4
  • 10
  • 2
    I thought Gibbs samplers were non-deterministic. Exactly how "different are these results? – IRTFM Jul 19 '22 at 02:27
  • Further to @IRTFM comment, MCMC sampling processes are stochastic in nature. So unless you set a fixed random seed (by either using `set.seed()` or specifying a `seed` argument to `jags()`) your results will never be exactly the same. The interesting thing is to see *how* different results are. If they are vastly different, they may indicate that chains didn't mix properly. – Maurits Evers Jul 19 '22 at 04:25
  • Edited to add some examples of how results changed. I do have seeds set. Thanks! – pakalla Jul 19 '22 at 17:27
  • To my eyes those are sufficiently similar to say that there is no real difference. I’d investigate the possibility of a different RNG across versions. – IRTFM Jul 19 '22 at 18:11
  • @IRTFM even the change in direction for the credible interval from -0.38 to 0.40? It substantially changes the interpretation of the finding. Either way, there must be something to explain why it changed at all. I'm looking into different RNGs but a lot of this is over my head. – pakalla Jul 19 '22 at 22:38
  • The point estimates are comparable. The width of the “confidence intervals” are similar. If you are using Bayesian methods you should not be making Fisherian or N-P interpretations. – IRTFM Jul 19 '22 at 23:46

1 Answers1

2

There are at least 3 possible reasons for you seeing a difference between results from these models:

  1. One or both of your attempts to fit this model did not converge, and/or your effective sample size is so small that random sampling error is having a large impact on your inference. If you have already checked to ensure convergence and sufficient effective sample size (for both models) then you can rule this out.

  2. You are seeing small differences in the posteriors due to the random sampling inherent to MCMC in otherwise converged results. If these differences are big enough to cause a meaningful difference in inference then your effective sample size is not high enough - so just run the models for longer and the difference should reduce. You can also set the random seed in JAGS using initial values for .RNG.seed and .RNG.name so that successive model runs are numerically identical. If you run the models for longer and this difference does not reduce (or if it is a large difference to begin with) then you can rule this out.

  3. Your model contains a node for which the default sampling scheme changed between JAGS 4.2.0 and 4.3.0 - there were some changes to sampling schemes (and the order of precedence for assigning samplers to nodes) that could conceivably have changed your results (from memory I think this affected GLM particularly, but I can't remember exactly). However, although this may affect the probability of convergence, it should not substantially affect the posterior if the model does converge. It may be contributing to a numerical difference as explained for point (2) though.

I'd recommend first ensuring convergence of both models, and then (assuming they did both converge) looking at exactly how much of a difference you are seeing. If it looks like both models converged AND the difference is more than just random sampling variation, then please reply here and/or update your question (as that shouldn't happen ... i.e. we may need to look into the possibility of a bug in JAGS).

Thanks,

Matt

--------- Edit following additional information added to the question --------

Based on what you have written, it does seem that the difference in inference exceeds what might be expected due to random variation, so there may be some kind of underlying issue here. In order to diagnose this further we would need a minimal reproducible example (https://stackoverflow.com/help/minimal-reproducible-example). This means that you would need to provide not only the model (or preferably a simplified model that still exhibits the problem) but also some data to which we can fit the model. If your data are too sensitive to share then this could be a fictitious dataset for which you also see a difference between JAGS 4.2.0 and JAGS 4.3.0.

The official help forum for JAGS is at https://sourceforge.net/p/mcmc-jags/discussion/610037/ - so you can certainly post there, although we would still need a minimal reproducible example to be able to do anything. If you do so, then please update both posts with a link to the other so that anyone reading either post knows about the cross-posting. You should also note that R2jags is not officially supported on the JAGS forums, so please provide the minimal reproducible example using plain rjags code (or runjags if you prefer) rather than using the R2jags wrapper.

To answer your question in the comments: in order to obtain information on the samplers used you can use rjags::list.samplers() eg:

library(rjags)

# LINE is just a small example model built into rjags:
data(LINE)
LINE$recompile()

# $`bugs::ConjugateGamma`
# [1] "tau"

# $`bugs::ConjugateNormal`
# [1] "alpha"

# $`bugs::ConjugateNormal`
# [1] "beta"
Matt Denwood
  • 2,537
  • 1
  • 12
  • 16
  • Thank you for such a comprehensive reply. I edited my post to add more information to address your points. Is there a way for me to look further into point 3 (nodes)? – pakalla Jul 19 '22 at 17:28
  • I’m guessing that there is a mailing list for JAGS and other Bayesian packages. What does the DESCRIPTION file say? https://cran.r-project.org/web/packages/R2jags/index.html – IRTFM Jul 19 '22 at 18:14
  • Thanks, working on the reproducible model. It'll take me awhile because I need IT to reinstall JAGS 4.2.0 and I don't know plain rjags code. I'll update when I've got it. – pakalla Jul 21 '22 at 18:54