4

So I'm using statsmodel package to do a poisson regression on my data set, I made sure that my training y are indeed counts and integers. However, when I print the predicted values(testmodely below) they are floats.

I'm super confused, I expect them to be whole numbers, since the input data and what has been fitted to the model was whole numbers, and poisson outputs count data. Do you have any idea where I'm making a mistake? Thanks a ton beforehand.

import statsmodels.api as sm
poi_model = sm.GLM(trainingy,trainingx, family=sm.families.Poisson())
poi_results = poi_model.fit()
paramet = poi_results.params
testmodely = poi_model.predict(paramet, testx, linear=False)
mani
  • 71
  • 3

1 Answers1

4

A Poisson model predicts the mean, which is the expected value or intensity of the Poisson random variable. This is in general not an integer. Using the Poisson intensity we can get the full distribution for new observations assuming the the distributional assumption is correct.

This is similar to logistic regression or logit where the prediction is the probability to observe an event or class. This is also the mean or expected value of the corresponding random variable. In classification problems the probability is replaced by an assignment to the most likely class, which is binary 0, 1 and not a real number.

Josef
  • 21,998
  • 3
  • 54
  • 67
  • 1
    Thanks for your response. The thing is that when I print poi_result.mu , I get what you just described which is the rate for each sample(generally a float) . However also when I print predictions I get another set of floats (meaning they can not be the same thing, i.e the mean of poisson distributions for each sample ). I assume the second set are the realizations of samples, and thus should be whole numbers. If both are rates(maybe different ones!), how should I create a realization of them? Thanks a lot. – mani Mar 22 '18 at 13:53
  • 1
    poi_result.mu is only for the training samples and is the same as fittedvalues. predict can be used for out of sample prediction with new exog. You can use the predicted mean with scipy.stats.poisson to have access to all distribution methods, like pmf, cdf and rvs. – Josef Mar 22 '18 at 14:42
  • 1
    Thanks for your help, that is exactly right. Too bad I saw your answer too late. For future references, you have to apply predict on the fitted model not the original model( on poi_results not on poi_model, as I've done above.). Also as you pointed out statsmodel is only capable of producing mu's (means) and for realizations, you'll need to go to spicy.stats. – mani Mar 22 '18 at 16:47
  • I wonder if any of you can elaborate a bit more on how to turn the mu's (the means) into the real values of the realization using scipy.stats.poisson. I am confused on how to combine the intensity of the Poisson model in statsmodels with scipy. – Irene Jul 26 '18 at 13:04