0

I am confused about the interpretation for the negative binomial regression with python pymc3 package. I am not sure how to interpret the mu and alpha in GLM. Here I have a simple vector, and I want to find the NB regression model for itself:

# The data
y = [100,200,300,400,50,300,60,89,90,100,100]
data = {'y':y, 'x':[1]*len(y)}
basic_model = pm.Model()
with basic_model:
    fml = 'y~x'
    pm.glm.GLM.from_formula(formula=fml, data=data, family=pm.glm.families.NegativeBinomial())
    # draw 500 posterior samples
    trace = pm.sample(500)
summary = pm.summary(trace, varnames=rvs)[['mean','hpd_2.5','hpd_97.5']]
print(summary)

Then I got output like:

                 mean     hpd_2.5    hpd_97.5
Intercept -281.884463 -684.069010  718.375125
x          287.000388 -714.168056  689.477911
mu          26.674426    3.526181   63.358150
alpha        2.461808    1.353676    3.452103

I understand that the Intercept & x part as y = exp(-281.884463*287.000388*x) from here.

But how to interpret the mu and alpha? I tried to use stats.gamma.rvs(alpha, scale=mu / alpha, size=size) but the histogram looks way off. Thank you!

kikyo91
  • 37
  • 1
  • 8

1 Answers1

0

So, the alpha and the mu parameters are parameters of a Exponential distribution where mu is the mean and alpha is the gamma parameter. So in exponential distribution, 1/gamma is mean, and 1/(gamma^2) is the variance, if mu is mean that means the mu / alpha is the variance as stated in the scale = mu / alpha in the function call.

The way to think about it is something like this:

There is an interesting (key), relationship between the Poisson and Exponential distribution. If you expect gamma events on average for each unit of time, then the average waiting time between events is Exponentially distributed, with parameter gamma (thus average wait time is 1/gamma), and the number of events counted in each unit of time is Poisson distributed with parameter gamma.

I hope this makes it a little more clear and gives you some intuition how to think about it.

Novak
  • 2,143
  • 1
  • 12
  • 22
  • Thank you so much! I read about the [poisson-gamma mixture](https://www.icpsr.umich.edu/CrimeStat/files/CrimeStatAppendix.D.pdf) after reading exponential distribution you mentioned. So the `mu` (not the `np.exp(mu)`) here would be the mean/expectation for dependent variable y, and `alpha` (not the `np.exp(alpha)`) would be the gamma shape parameter for all independent variables, which are hypothesized to be negative binomial distributed. But for dependent variables, like `x` here, the 'np.exp(x)' is the mean. – kikyo91 Oct 17 '18 at 01:30
  • 1
    And `f = stats.gamma.rvs(alpha, scale=mu / alpha, size=size)` only give me a gamma distribution, if then use `stats.poisson.rvs(f)` would give me negative binomial distributed samples. – kikyo91 Oct 17 '18 at 01:32