4

I have been trying to find a way to fit some of my columns (that contains user click data) to poisson distribution in python. These columns (e.g., click_website_1, click_website_2) may contain a value ranging from 1 to thousands. I am trying to do this as it is recommended by some resources:

We recommend that count data should not be analysed by log-transforming it, but instead models based on Poisson and negative binomial distributions should be used.

I found some methods in scipy and numpy, but these methods seem to generate some random numbers that have poisson distribution. However, what I am interested in is to fit my own data to poisson distribution. Any library suggestions to do this in Python?

renakre
  • 8,001
  • 5
  • 46
  • 99
  • `convert my own data to poisson distribution` - it's not clear what this is supposed to mean. – cel Feb 26 '17 at 06:37
  • @cel I mean **fit my data to poisson distribution**? If the phrasing wrong, I wonder what I should do to follow the recommendation of using poisson distribution? – renakre Feb 26 '17 at 06:40
  • 1
    The poisson distribution has only one single rate parameter. You can estimate it from your data using the maximum likelihood estimator. The form is described on wikipedia: https://en.wikipedia.org/wiki/Poisson_distribution#Maximum_likelihood – cel Feb 26 '17 at 06:44
  • @cel thanks! is there a way to do this using a python library? There is this: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.poisson.html but not sure how to adapt this to my data? – renakre Feb 26 '17 at 06:49
  • 1
    Have you had a look at the formula? it's just calculating the mean of your observations. You will find ways to do that in numpy. scipy's poisson distribution you only need if you want to work with the distribution once you have estimated the rate parameter. – cel Feb 26 '17 at 07:05
  • as @cel pointed out right question is NOT to fit your data into poisson distribution but to check if your data follows poisson distribution http://stats.stackexchange.com/questions/1174/how-can-i-test-if-given-samples-are-taken-from-a-poisson-distribution you reasoning step looks fine you need to validate it (which often fails) – abhiieor Feb 26 '17 at 07:11

1 Answers1

8

Here is a quick way to check if your data follows a poisson distribution. You plot the under the assumption that it follows a poisson distribution with rate parameter lambda = data.mean()

import numpy as np
from scipy.misc import factorial


def poisson(k, lamb):
    """poisson pdf, parameter lamb is the fit parameter"""
    return (lamb**k/factorial(k)) * np.exp(-lamb)

# lets collect clicks since we are going to need it later
clicks = df["clicks_website_1"] 

Here we use the pmf for possion distribution.

Now lets do some modeling, from data (click_website_one) we'll estimate the the poisson parameter using the MLE, which turns out to be just the mean

lamb = clicks.mean()

# plot the pmf using lamb as as an estimate for `lambda`. 
# let sort the counts in the columns first.

clicks.sort().apply(poisson, lamb).plot()
parsethis
  • 7,998
  • 3
  • 29
  • 31