Slow sampling of STAN compared to BUGS

Question

I am trying to make a switch from WinBugs to Stan, since I like the programming language and usage in R is better for Stan than WinBugs. I recoded my hierarchical Bayesian model to the Stan language, and applied all recommended settings I found on the internet. However, my WinBugs sampler is over twenty times as fast as the Stan sampler. (so I'm not talking about the compiler, of which I know it is slower for Stan.)

I tried to speed up the sampling any way I can find on the internet, I vectorized my model and I'm running my chains on 2 cores for Stan and only 1 for Bugs. All forums I can find say Stan shouldn't be much slower than Bugs. Does anyone have any idea why the speed difference is so big and how I can speed my sampling up?

My models for BUGS and Stan are posted below. The first three lines of each code are the main part of the code, the rest is just priors. T and N are both about 150.

Thank you!

BUGS:

model
  {
    for( i in 1 : N)
    {
      for (t in 1 : T)
      {
        Y[i,t] ~ dnorm(mu[i,t],hy[i])
        mu[i,t] <-  alpha1[i] + rho[i] * X0[i,t] + phi[i,t] * X1[i,t] + beta1[i] * X2[i,t] + beta2[i] * X3[i,t]  + beta3[i] * X4[i,t] + beta4[i] * gt[i,t]
        phi[i,t] <- alpha2[i] + gamma1[i] * exp(gamma2[i]*pow(M[i] - P[i,t],2)) + beta5[i] * X5[i,t] + beta6[i] * X6[i,t]
      }
      alpha1[i] ~ dflat()
      hy[i] ~ dgamma(0,0)
      rho[i] ~ dflat()
      beta1[i] ~ dflat()
      beta2[i] ~ dflat()
      beta4[i] ~ dflat()
      alpha2[i] ~ dnorm(mu_alpha2, sigma_alpha2)
      gamma1[i] ~ dnorm(mu_gamma1, sigma_gamma1)
      gamma2[i] ~ dnorm(mu_gamma2, sigma_gamma2)
      M[i] ~ dnorm(mu_M, sigma_M)
      beta3[i] ~ dnorm(mu_beta3, sigma_beta3)
      beta5[i] ~ dnorm(mu_beta5, sigma_beta5)
      beta6[i] ~ dnorm(mu_beta6, sigma_beta6)
    }
  mu_alpha2 ~ dflat()
  mu_gamma1 ~ dflat()
  mu_gamma2 ~ dunif(-1.5,-0.5)
  mu_M ~ dunif(0.7,1.3)
  mu_beta3 ~ dflat()
  mu_beta5 ~ dflat()
  mu_beta6 ~ dflat()
  sigma_alpha2 ~ dgamma(0.1,0.1)
  sigma_gamma1 ~ dgamma(0.1,0.1)
  sigma_gamma2 ~ dgamma(0.1,0.1)
  sigma_M ~ dgamma(0.1,0.1)
  sigma_beta3 ~ dgamma(0.1,0.1)
  sigma_beta5 ~ dgamma(0.1,0.1)
  sigma_beta6 ~ dgamma(0.1,0.1)
}

Stan:

transformed parameters {
  for (i in 1:N) {
    phi[i] <- alpha2[i] + gamma1[i] * exp(gamma2[i] * (M[i] - P[i]) .* (M[i] - P[i])) + beta4[i] * log(t) + beta5[i] * D4[i];
    mu[i] <- alpha1[i] + phi[i] .* X1[i] + rho[i] * X0[i] + beta1[i] * X2[i] + beta2[i] * X3[i] + beta3[i] * X4[i];
    sigma[i] <- sqrt(sigma_sq[i]);
  }

  sigma_beta3 <- sqrt(sigma_sq_beta3);
  sigma_gamma1 <- sqrt(sigma_sq_gamma1);
  sigma_gamma2 <- sqrt(sigma_sq_gamma2);
  sigma_M <- sqrt(sigma_sq_M);
  sigma_beta4 <- sqrt(sigma_sq_beta4);
  sigma_beta5 <- sqrt(sigma_sq_beta5);
  sigma_alpha2 <- sqrt(sigma_sq_alpha2);
}
model {
  // Priors
  alpha1 ~ normal(0,100);
  rho ~ normal(0,100);
  beta1 ~ normal(0,100);
  beta2 ~ normal(0,100);

  mu_beta3 ~ normal(0,100);
  mu_alpha2 ~ normal(0,100);
  mu_gamma1 ~ normal(0,100);
  mu_gamma2 ~ uniform(-1.5, -0.5);
  mu_M ~ uniform(0.7, 1.3);
  mu_beta4 ~ normal(0,100);
  mu_beta5 ~ normal(0,100);

  sigma_sq ~ inv_gamma(0.001, 0.001);
  sigma_sq_beta3 ~ inv_gamma(0.001, 0.001);
  sigma_sq_alpha2 ~ inv_gamma(0.001, 0.001);
  sigma_sq_gamma1 ~ inv_gamma(0.001, 0.001);
  sigma_sq_gamma2 ~ inv_gamma(0.001, 0.001);
  sigma_sq_M ~ inv_gamma(0.001, 0.001);
  sigma_sq_beta4 ~ inv_gamma(0.001, 0.001);
  sigma_sq_beta5 ~ inv_gamma(0.001, 0.001);

  // likelihoods

  beta3 ~ normal(mu_beta3, sigma_beta3);
  alpha2 ~ normal(mu_alpha2, sigma_alpha2);
  gamma1 ~ normal(mu_gamma1, sigma_gamma1);
  gamma2 ~ normal(mu_gamma2, sigma_gamma2);
  M ~ normal(mu_M, sigma_M);
  beta4 ~ normal(mu_beta4, sigma_beta4);
  beta5 ~ normal(mu_beta5, sigma_beta5);

  for (i in 1:N){
    Y[i] ~ normal(mu[i], sigma[i]);
  }
}

I haven't looked in detail to verify that your BUGS code and Stan code define the same posterior, but assuming so, then I would suggest changing the prior on the `sigma_` parameters from `inv_gamma(0.001, 0.001)` to something more reasonable like `inv_gamma(0.1, 0.1)` or (half) `cauchy`. These inverse gamma priors with both hyperparameters very close to zero have very extreme shapes that lead to problems that manifest themselves in different ways. In BUGS, the sampler can miss the tails. In Stan, it will try to get the tails but make take forever and fail to sample the tails adequately anyway. — Ben Goodrich, May 30 '16 at 18:19
Thanks for the answer @BenGoodrich. It did improve the sampling a bit, but especially for large data sets and a high number of draws, the model is still a lot faster in BUGS. Could it be that winbugs 'oversimplifies' something or is maybe the gibbs sampler that much faster? — Aron, Jun 08 '16 at 07:34

Slow sampling of STAN compared to BUGS

0 Answers0