7

I have a model described in pymc3 using the following:

from pymc3 import * 
basic_model = Model()

with basic_model:
    # Priors for unknown model parameters
    alpha = Normal('alpha', mu=0, sd=10)
    beta = Normal('beta', mu=0, sd=10, shape=18)
    sigma = HalfNormal('sigma', sd=1)

    # Expected value of outcome
    mu = alpha + beta[0]*X1 + beta[1]*X2 + beta[2]*X3

    # Likelihood (sampling distribution) of observations
    Y_obs = Normal('Y_obs', mu=mu, sd=sigma, observed=Y)

However, my Ys are not normally distributed but are binary (so, Bernoulli, I think). I can't figure out how to change the Normal distrubtion of Y to Bernoulli though because I can't figure out what the params would be of Y_obs in that case.

recluze
  • 1,920
  • 4
  • 21
  • 34

1 Answers1

12

What you are looking for is logistic regression. Here you use the logistic function to convert the output of your linear model to a probability.

In your example this could be specified as follows:

import pyMc3 as pm
import theano.tensor as T
basic_model = pm.Model()

def logistic(l):
    return 1 / (1 + T.exp(-l))

with basic_model:
    # Priors for unknown model parameters
    alpha = pm.Normal('alpha', mu=0, sd=10)
    beta = pm.Normal('beta', mu=0, sd=10, shape=18)

    # Expected value of outcome
    mu = alpha + beta[0]*X1 + beta[1]*X2 + beta[2]*X3

    # Likelihood (sampling distribution) of observations
    Y_obs = pm.Bernoulli('Y_obs', p=logistic(mu), observed=Y)
Espoir Murhabazi
  • 5,973
  • 5
  • 42
  • 73
Kiudee
  • 413
  • 3
  • 9
  • Thanks. That makes sense (though I do know about logistic regression but I couldn't see how to model that in bayesian way). I'm giving your changes a try right now and will post here how it goes. – recluze Sep 15 '15 at 15:46
  • Well, it works but I get "exact" values for all `beta`s with standard deviation of `0.00` for all of them! I'm not even sure what question to ask here in order to proceed but I'm sure I'm doing something wrong :( ... (My objective is simply to get some insight into my data -- purely exploratory) – recluze Sep 15 '15 at 15:55
  • Sounds like the sampler has difficulties accepting new points to sample. Which sampler do you use and how do you set the starting point? [Here](https://github.com/pymc-devs/pymc3/blob/master/pymc3/examples/logistic.py) is an example of logistic regression with pymc3 which you could try. – Kiudee Sep 15 '15 at 19:11
  • That was a great pointer. I had gone to that github folder but missed the `logistic.py` file. That helped a lot. I'm now training well with a basic `find_map` starting point and the NUTS sampler. – recluze Sep 16 '15 at 07:13
  • It may also be helpful to include random error in the log-odds (what you're calling `mu` here), by adding normally-distributed noise: ```eps = pm.Normal('eps', mu=0, sd=0.2); mu = alpha + beta[0]*X1 + beta[1]*X2 + beta[2]*X3 + eps``` – David Diaz Oct 14 '21 at 15:23
  • 1
    Also worth noting that PyMC3 (and other probabilistic packages like Stan or Pyro/Numpyro) offer parameterization of a Bernoulli random variable using logits, so you could skip the logistic here and use `Y_obs = pm.Bernoulli('Y_obs', logit_p=mu, observed=Y)`. According to the Stan documentation, this logit parameterization of the Bernoulli distribution is more numerically stable... – David Diaz Oct 14 '21 at 15:31