Hierarchical linear regression in PyMC3, converging issues

Question

(This question is probably easier to understand if you have access to Kruschke's Doing Bayesian Data Analysis book, since I'm trying to build the model on p 493)

Basically I'm trying to build this model https://drive.google.com/open?id=0BxzmgRR6TC-2dUN3VEs4enZZQ1U in PyMC3, by using the methods from this blog post http://twiecki.github.io/blog/2014/03/17/bayesian-glms-3/

This is the code I've arrive at

with pm.Model() as multi_level_model:
# Top level

mu_0 = pm.Normal('mu_0', mu=0, sd=100**2)
sigma_0 = pm.Uniform('sigma_0', lower=0, upper=100)

mu_1 = pm.Normal('mu_1', mu=0, sd=100**2)
sigma_1 = pm.Uniform('sigma_1', lower=0, upper=100)

# Middle
beta_0 = pm.Normal('beta_0', mu=mu_0, sd=sigma_0, shape=n_subj)
beta_1 = pm.Normal('beta_1', mu=mu_1, sd=sigma_1, shape=n_subj)

nu = pm.Exponential('nu', lam=1/29.) + 1
sigma = pm.Uniform('sigma', lower=0, upper=100)

# subj_idx is a vector with what subject each observation comes from
# data_hier['X'] is the predictor
mu_ij = beta_0[subj_idx] + beta_1[subj_idx]*data_hier['X'].values

# Bottom
# data_hier['Y'] is the prdicted values
obs = pm.StudentT('obs', mu = mu_ij, sd=sigma, nu=nu, observed=data_hier['Y'].values)

#start = pm.find_MAP() This seems to never finish
step1 = pm.Metropolis([sigma, sigma_0, sigma_1])
step2 = pm.NUTS([mu_0, mu_1, nu, beta_0, beta_1])
multi_level_trace = pm.sample(draws=2000, step=[step1, step2])

However, it doesn't converge as well as the same model build in pystan. (And takes longer to run). Any tips on how the model can be improved?

The data can be found here https://drive.google.com/file/d/0BxzmgRR6TC-2Qk5WcjhaaGJSTmM/view?usp=sharing

Try standardizing your data, or at least centering your "Xs". If you center the data you can set mu_0 = pm.Normal('mu_0', mu=Y.mean(), sd=10) If NUTS still have trouble just use Metropolis with more steps and then burn-in as necessary. BTW you can directly use pm.Exponential('nu', lam=1/30) Since the Student t distribution is defined in the interval [0, inf] — aloctavodia, Jun 12 '16 at 23:57
Thanks! Standardising the data made it a lot better, then all but 'nu' parmeter converged ok. I'm not sure how important it is that that parameter is converging? Switching to only using metropolis and upping the sample size made it perform on a similar level to pystan, both in time it took to run the model and the output. — Jon Sjöberg, Jun 13 '16 at 11:24
You are welcome. Convergence is important but probably in your model for nu is not "that" important (I think Kruschke say something about this in his book). Let me know if you get some improvement. I will try to check your model and data with more detail in the next few days (specially to check why NUTS seems to be having so much trouble). — aloctavodia, Jun 13 '16 at 19:03
Yeah, Kruschke writes "Occasionally, a run will produce a chain that is well behaved for all parameters except the normality parameter "nu". This chain nevertheless explores reasonable values of "nu" and is well behaved on other parameters. If this is worrisome, simply run JAGS again until all chains are well behaved (or translate to STAN)". Now as someone new to this, I don't really know if this is worrisome or not. But I can't seem to get a run where "nu" is well behaved with my implementation.. — Jon Sjöberg, Jun 14 '16 at 16:48
Could you post an updated version of your model including any data transformation? I am traveling and only with my phone now. But I can check it latter. — aloctavodia, Jun 14 '16 at 16:55
It would be awesome if you could check it, I created a repository here https://github.com/jonsjoberg/hierachical_linear_regression so the data is included as well. — Jon Sjöberg, Jun 14 '16 at 17:43
Fwiw, I tried to build the quadratic model that Kruschke presents after this model, and again I run into problems with pymc3. So if you ever would like to extend your repository of models from DBDA in pymc3 I would be very grateful if you included these models :) — Jon Sjöberg, Jun 15 '16 at 20:45
I get good mixing using only metropolis and centering or standardizing the data. I think your groups are very diverse in term of their slope and intercept (if estimated as separated groups). — aloctavodia, Jun 15 '16 at 21:48
I've pushed the output I'm seeing from metropolis sampling (with standardization) as well as a quadratic model and the output from that to the repository I linked to above. If you have time it would be awesome if you could comment on if you think those chains are mixing good enough? Anyway, a million thanks for your help! — Jon Sjöberg, Jun 16 '16 at 15:12
I will do it ASAP. BTW feel free to contribute to the repository of [PyMC3 models from DBDA](https://github.com/aloctavodia/Doing_bayesian_data_analysis) — aloctavodia, Jun 16 '16 at 17:20

Hierarchical linear regression in PyMC3, converging issues

0 Answers0