How to speed up the rjags model training in Bayesian ranking?

Question

All,

I am doing Bayesian modeling using rjags. However, when the number of observation is larger than 1000. The graph size is too big.

More specifically, I am doing a Bayesian ranking problem. Traditionally, one observation means one X[i, 1:N]-Y[i] pair, where X[i, 1:N] means the i-th item is represented by a N-size predictor vector, and Y[i] is a response. The objective is to minimize the point-wise error of predicted values,for example, least square error.

A ranking problem is different. Since we more care about the order, we use a pair-wise 1-0 indicator to represent the order between Y[i] and Y[j], for example, when Y[i]>Y[j], I(i,j)=1; otherwise I(i,j)=0. We treat this 1-0 indicator as an observation. Therefore, assuming we have K items: Y[1:K], the number of indicator is 0.5*K*(K-1). Hence when K is increased from 500 to 5000, the number of observations is very large, i.e. from 500^2 to 5000^2. The garph size of the rjags model is large too, for example graph size > 500,000. And the log-posterior will be very small.

And it takes a long time to complete the training. I think the consumed time is >40 hours. It is not practical for me to do further experiment. Therefore, do you have any idea to speed up the rjags. I heard that the RStan is faster than Rjags. Any one who has similar experience?

Whether Stan would be faster depends on the model. But one thing we emphasize in our comparisons between Stan and BUGS-derivatives is that the wall time to complete a fixed number of iterations is not as relevant as the number of effective samples from the posterior distribution per second. Sometimes Gibbs samplers complete many iterations in a short period of time but the draws are highly dependent and thus yield few effective samples from the posterior distribution. In any event, if JAGS isn't feasible for your problem, then you should probably give Stan a try. — Ben Goodrich, Nov 21 '13 at 04:20
@BenGoodrich I am doing a Bayesian ranking problem. Traditionally, one observation means one X[i, 1:N]-Y[i] pair, where X[i, 1:N] means the i-th item is represented the K-predictor vector, and Y[i] is a response value. The objective is to minimize the point-wise error of predicted values,for example, least square error. — Jack Fu, Nov 21 '13 at 05:16
@BenGoodrich A ranking problem is different. Since we more care about the order, we use a pair-wise 1-0 indicator to represent the order between Y[i] and Y[j]. We treat this 1-0 indicator as an observation. Therefore, assuming we have K items: Y[1:K], the number of indicator is 0.5*K*(K-1). Hence when K is increased from 500 to 5000, the number of observations is very large, the garph size of the rjags model is large too and the log-posterior will be very small. — Jack Fu, Nov 21 '13 at 05:17
Since Stan works directly with the log-posterior rather than parsing a graphical model, there shouldn't be a technical problem. So, if you can write down the log-posterior, then it should be fine. Presumably there would be a latent vector of length K, a likelihood of observing the pairwise preferences given the latent vector, and some priors. — Ben Goodrich, Nov 21 '13 at 20:20
@BenGoodrich I see. Stan actually work with log posterior, which is why Stan allow us to write down the log probability function to replace the sampling statement. BTW, I have another question for you. I found that integer parameters or integer transformed parameters are not allowed. However, since in my model, there is a guassian mixture model. Therefore I need a latent integer vector to represent the cluster assignment for each point. — Jack Fu, Nov 22 '13 at 02:08
@BenGoodrich I implement this Rstan code. But, it seems it pretty slow under the setting of chains=4 and iter=4000. Could you give me a hand on optimizing the Stan code? — Jack Fu, Nov 22 '13 at 05:43
I may be able to help, but you would need to post the Stan code (either here or on the stan-users Google group). It is true that Stan does not support discrete parameters, but as far as I know, you can use a continuous latent vector and then the ranks of that would be the posterior order. But if it is really necessary to use discrete parameters it is often possible to marginalize the log-posterior over them. There are several examples in the Stan manual, but the details depend on the particular problem. — Ben Goodrich, Nov 23 '13 at 02:47
@BenGoodrich Regarding "you can use a continuous latent vector and then the ranks of that would be the posterior order", let "int R[I]" denote the cluster assignment of I points, then given point i, the cluster assignment R[i] ~ categorical(eta), in which eta is the proportion vector of mixtures. And R[i] should be an integer. It seems I cannot convert the integer value drawn from categorical distribution to a real value. Then how to use a continuous latent vector to store the cluster assignments of I points. — Jack Fu, Nov 23 '13 at 06:28
@BenGoodrich Can I extract the local variables defined in the model block? Since variables are not allowed to be assigned when they are parameters, therefore I have to put them into the model. But I need to see the values of those local variables. — Jack Fu, Nov 23 '13 at 07:19

How to speed up the rjags model training in Bayesian ranking?

0 Answers0