0

I have a fitted Poisson model in statsmodels. For each of my observations I want to calculate the probability of observing a value that is at least that high. In other words I want to calculate:

P(y >= y_i | x_i)

(This should be possible, because the fitted model predicts some value lambda as a function of my independent variable x. This lambda_i value defines a Poisson distribution, from which I should be able to derive a probability.)

My question is really about the implementation in statsmodels, less about the statistics. Although if you believe it is relevant, please do elaborate.

Willem
  • 976
  • 9
  • 24
  • Not sure if I understand correctly, but don't you just have to call the `cdf` method: `1 - model.cdf(y_i)`? I don't quite get the conditioning on `x_i`. – Timus Sep 30 '20 at 15:06
  • I can't see any `cdf` method on the Poisson model object. That said, the conditioning on `x_i` is important, but maybe it is automatically included in the model method. The reason it is important is because the model basically fits a Poisson distribution at each value of x. (The Poisson distribution is parameterized by lambda, where lambda will be a function of x. If we do not use x, the distribution would be the same for all the data, which is incorrect.) – Willem Sep 30 '20 at 15:15

1 Answers1

1

For Poisson we can just use the distribution from scipy.stats to compute results for given predicted means.

e.g.

mu = my_results.predict(...)
stats.poisson.sf(counts, mu)

similar usage with pmf is in https://github.com/statsmodels/statsmodels/blob/master/statsmodels/discrete/discrete_model.py#L3922

Josef
  • 21,998
  • 3
  • 54
  • 67
  • I'm looking to do something similar also for a fitted gamma distribution. As mentioned in other questions this is not so easy because the shape parameter mentioned in the model summary isn't the actual shape: https://stackoverflow.com/questions/60215085/calculating-scale-dispersion-of-gamma-glm-using-statsmodels. I've seen your answer in this question: https://stackoverflow.com/questions/41749167/glm-gamma-regression-in-python-statsmodels. But I don't understand the resulting distribution and documentation is extremely limited. Any help/example would be appreciated! Thanks in advance! – Willem Oct 01 '20 at 12:07
  • My statsmodels issue comment seems to have the correct parameter transformation, in response to the stackoverflow question 602.... https://github.com/statsmodels/statsmodels/issues/106#issuecomment-586072934 – Josef Oct 01 '20 at 12:48
  • In that github issue you change the parametrization to generate the data with `numpy`. But that only raises the question: is the parameterization in `numpy` strange, and the one in `statsmodels` correct? Or is the parameterization in `statsmodels` the one that is strange, so you need to adapt the `numpy` data generation? In any case I've posted a new question that explains it better: https://stackoverflow.com/questions/64174603/how-to-use-scale-and-shape-parameters-of-gamma-glm-in-statsmodels. In any case thanks for your help so far! – Willem Oct 02 '20 at 15:53