I am trying to make a switch from WinBugs to Stan, since I like the programming language and usage in R is better for Stan than WinBugs. I recoded my hierarchical Bayesian model to the Stan language, and applied all recommended settings I found on the internet. However, my WinBugs sampler is over twenty times as fast as the Stan sampler. (so I'm not talking about the compiler, of which I know it is slower for Stan.)
I tried to speed up the sampling any way I can find on the internet, I vectorized my model and I'm running my chains on 2 cores for Stan and only 1 for Bugs. All forums I can find say Stan shouldn't be much slower than Bugs. Does anyone have any idea why the speed difference is so big and how I can speed my sampling up?
My models for BUGS and Stan are posted below. The first three lines of each code are the main part of the code, the rest is just priors. T and N are both about 150.
Thank you!
BUGS:
model
{
for( i in 1 : N)
{
for (t in 1 : T)
{
Y[i,t] ~ dnorm(mu[i,t],hy[i])
mu[i,t] <- alpha1[i] + rho[i] * X0[i,t] + phi[i,t] * X1[i,t] + beta1[i] * X2[i,t] + beta2[i] * X3[i,t] + beta3[i] * X4[i,t] + beta4[i] * gt[i,t]
phi[i,t] <- alpha2[i] + gamma1[i] * exp(gamma2[i]*pow(M[i] - P[i,t],2)) + beta5[i] * X5[i,t] + beta6[i] * X6[i,t]
}
alpha1[i] ~ dflat()
hy[i] ~ dgamma(0,0)
rho[i] ~ dflat()
beta1[i] ~ dflat()
beta2[i] ~ dflat()
beta4[i] ~ dflat()
alpha2[i] ~ dnorm(mu_alpha2, sigma_alpha2)
gamma1[i] ~ dnorm(mu_gamma1, sigma_gamma1)
gamma2[i] ~ dnorm(mu_gamma2, sigma_gamma2)
M[i] ~ dnorm(mu_M, sigma_M)
beta3[i] ~ dnorm(mu_beta3, sigma_beta3)
beta5[i] ~ dnorm(mu_beta5, sigma_beta5)
beta6[i] ~ dnorm(mu_beta6, sigma_beta6)
}
mu_alpha2 ~ dflat()
mu_gamma1 ~ dflat()
mu_gamma2 ~ dunif(-1.5,-0.5)
mu_M ~ dunif(0.7,1.3)
mu_beta3 ~ dflat()
mu_beta5 ~ dflat()
mu_beta6 ~ dflat()
sigma_alpha2 ~ dgamma(0.1,0.1)
sigma_gamma1 ~ dgamma(0.1,0.1)
sigma_gamma2 ~ dgamma(0.1,0.1)
sigma_M ~ dgamma(0.1,0.1)
sigma_beta3 ~ dgamma(0.1,0.1)
sigma_beta5 ~ dgamma(0.1,0.1)
sigma_beta6 ~ dgamma(0.1,0.1)
}
Stan:
transformed parameters {
for (i in 1:N) {
phi[i] <- alpha2[i] + gamma1[i] * exp(gamma2[i] * (M[i] - P[i]) .* (M[i] - P[i])) + beta4[i] * log(t) + beta5[i] * D4[i];
mu[i] <- alpha1[i] + phi[i] .* X1[i] + rho[i] * X0[i] + beta1[i] * X2[i] + beta2[i] * X3[i] + beta3[i] * X4[i];
sigma[i] <- sqrt(sigma_sq[i]);
}
sigma_beta3 <- sqrt(sigma_sq_beta3);
sigma_gamma1 <- sqrt(sigma_sq_gamma1);
sigma_gamma2 <- sqrt(sigma_sq_gamma2);
sigma_M <- sqrt(sigma_sq_M);
sigma_beta4 <- sqrt(sigma_sq_beta4);
sigma_beta5 <- sqrt(sigma_sq_beta5);
sigma_alpha2 <- sqrt(sigma_sq_alpha2);
}
model {
// Priors
alpha1 ~ normal(0,100);
rho ~ normal(0,100);
beta1 ~ normal(0,100);
beta2 ~ normal(0,100);
mu_beta3 ~ normal(0,100);
mu_alpha2 ~ normal(0,100);
mu_gamma1 ~ normal(0,100);
mu_gamma2 ~ uniform(-1.5, -0.5);
mu_M ~ uniform(0.7, 1.3);
mu_beta4 ~ normal(0,100);
mu_beta5 ~ normal(0,100);
sigma_sq ~ inv_gamma(0.001, 0.001);
sigma_sq_beta3 ~ inv_gamma(0.001, 0.001);
sigma_sq_alpha2 ~ inv_gamma(0.001, 0.001);
sigma_sq_gamma1 ~ inv_gamma(0.001, 0.001);
sigma_sq_gamma2 ~ inv_gamma(0.001, 0.001);
sigma_sq_M ~ inv_gamma(0.001, 0.001);
sigma_sq_beta4 ~ inv_gamma(0.001, 0.001);
sigma_sq_beta5 ~ inv_gamma(0.001, 0.001);
// likelihoods
beta3 ~ normal(mu_beta3, sigma_beta3);
alpha2 ~ normal(mu_alpha2, sigma_alpha2);
gamma1 ~ normal(mu_gamma1, sigma_gamma1);
gamma2 ~ normal(mu_gamma2, sigma_gamma2);
M ~ normal(mu_M, sigma_M);
beta4 ~ normal(mu_beta4, sigma_beta4);
beta5 ~ normal(mu_beta5, sigma_beta5);
for (i in 1:N){
Y[i] ~ normal(mu[i], sigma[i]);
}
}