0

I tried to solve a logistic regression model using PyMC. However, the diagnostic plots show very high autocorrelations and after repeated sampling from the posterior distribution, I sometimes obtain very different results, so maybe I'm not using PyMC properly.

The model is the following:

Y_i = Bernoulli(p_i)
logit(p_i) = b0 + b1*x1 + b2*x2

The implementation is this:

import pymc as pm
import numpy as np

b0 =  pm.Normal('b0',  0., 1e-6, value=0.)
b1 =  pm.Normal('b1',  0., 1e-6, value=0.)
b2 =  pm.Normal('b2',  0., 1e-6, value=0.)

x1 = np.array([31, 31, 36, 30, 32, 33, 31, 33, 32])
x2 = np.array([31, 16, 35, 42, 19, 37, 29, 23, 15])
value = np.array([1, 1, 1, 1, 0, 1, 1, 1, 0])

@pm.deterministic
def p(b0=b0, b1=b1, b2=b2, x1=x1, x2=x2):
    return np.exp(b0 + b1*x1 + b2*x2)/(1.0 + np.exp(b0 + b1*x1 + b2*x2))

obs = pm.Bernoulli('obs', p=p, value=value, observed=True)

m = pm.MCMC([obs, b0, b1, b2])

When I sample with m.sample(500000, 200000, 50) and plot the resulting posterior distribution, I get this:

enter image description here enter image description here enter image description here

In a second attempt to get better results, I used @pm.observed:

import pymc as pm
import numpy as np

b0 =  pm.Normal('b0',  0., 1e-6, value=0.)
b1 =  pm.Normal('b1',  0., 1e-6, value=0.)
b2 =  pm.Normal('b2',  0., 1e-6, value=0.)

x1 = np.array([31, 31, 36, 30, 32, 33, 31, 33, 32])
x2 = np.array([31, 16, 35, 42, 19, 37, 29, 23, 15])
value = np.array([1, 1, 1, 1, 0, 1, 1, 1, 0])

p = b0 + b1*x1 + b2*x2

@pm.observed
def obs(p=p, value=value):
    return pm.bernoulli_like(value, pm.invlogit(p))

m = pm.MCMC([obs, p, b0, b1, b2])

But it also produces high autocorrelations.

I increased the sample size without much success. What am I missing?

r_31415
  • 8,752
  • 17
  • 74
  • 121

1 Answers1

2

You definitely have some convergence issues. Part of the problem is that you have a very small sample size (n=9), but are trying to fit 3 parameters. I get a little better mileage by block-updating the parameters, but the intercept still does poorly:

beta =  pm.Normal('beta',  0., 0.001, value=[0.]*3)

x1 = np.array([31, 31, 36, 30, 32, 33, 31, 33, 32])
x2 = np.array([31, 16, 35, 42, 19, 37, 29, 23, 15])
value = np.array([1, 1, 1, 1, 0, 1, 1, 1, 0])

p = pm.Lambda('p', lambda b=beta: pm.invlogit(b[0] + b[1]*x1 + b[2]*x2))

obs = pm.Bernoulli('obs', p=p, value=value, observed=True)

(BTW, when you are logit-transforming your parameters, you don't need super-diffuse priors on your betas -- smaller than 0.001 is pretty fruitless). Things are improved further by using an adaptive Metropolis sampler on the betas:

M.use_step_method(AdaptiveMetropolis, M.beta)

This results in convergence, but as you can see, there is no information to inform the prior:

summary

Community
  • 1
  • 1
Chris Fonnesbeck
  • 4,143
  • 4
  • 29
  • 30
  • Thank you for your answer. You're right, now the results look much better. In this case, should I conclude that the variables `x1` and `x2` have no predictive power with respect to the variable `value`?(assuming that this is the best model for the available data) By the way, when I tried to use your code, I got an error saying that M doesn't have a "beta" property. I used this instead: `M.use_step_method(pm.AdaptiveMetropolis, beta)` – r_31415 Sep 01 '14 at 23:37
  • An additional question: I noticed that when I run `M = pm.MCMC([obs, beta])` and plot with `pm.Matplot.plot(M)`, I obtain posterior distributions for each parameter. However, if I change it to `M = pm.MCMC([obs, p, beta])`, plotting produces a lot of extra parameters named as p_1, p_2, etc. What is that? Deterministic variables don't need to be included in MCMC? If I remember correctly, using @pm.observed instead didn't generate those extra plots. – r_31415 Sep 01 '14 at 23:43
  • Deterministic variables are not subject to MCMC sampling, but we are often interested in their posterior distributions, so we save them as traces. You can turn this off, however, with a `trace=False` flag if you really don't want them. The value of observed variables do not change at every iteration, so there is no trace for those. – Chris Fonnesbeck Sep 02 '14 at 00:17
  • Got it. Would you comment on the issue of predictive power I described above? By the way, I enjoyed your Bios366 course on github. – r_31415 Sep 02 '14 at 01:49
  • Thanks. Regarding "predictive power", you cannot tell just by looking at the posterior estimates, though the credible interval for beta2 is entirely positive, suggesting that it influences events to some extent. I would compare models formally using something like AIC to see if models perform better with or without one or more of the predictors. Just because an analysis is underpowered, does not mean the variables themselves are of no use. – Chris Fonnesbeck Sep 02 '14 at 18:39
  • Oh, I thought beta2 was between negative and positive values. Now I confirmed it is at least very small but still positive. Thanks for all your help. – r_31415 Sep 03 '14 at 01:30