-2

I have a few datasets with large size and they are discrete in nature. I want to fit that data in a few distribution functions, to understand the outlyers in that data. But I am not able to find a few variables like "p" value and variance of the data to implement. Is there a generic way to identify those variable values?

Sitz Blogz
  • 1,061
  • 6
  • 30
  • 54

2 Answers2

1

How about:

def poisson(k, lamb):
        return (lamb ** k / factorial(k)) * np.exp(-lamb)


entries, bin_edges, patches = plt.hist(data_list, density=True, bins=100, range=[0, 50])
# calculate binmiddles
bin_middles = 0.5 * (bin_edges[1:] + bin_edges[:-1])

# fit with curve_fit
bin_middles_filtered = [bin_middles[i] for i in range(len(entries)) if entries[i] > 0.001]
parameters, cov = curve_fit(poisson, bin_middles, entries)

This provides you with a Poisson function and their parameters

Lazloo Xp
  • 858
  • 1
  • 11
  • 36
1

I think you are looking for Chi-Square Goodness-of-fit test. It is able to test if a sample of data came from a population with a specific distribution and works for discrete distributions such as Binomial and Poisson. More information on how to perform this analysis in Python can also be found here: Performing a Chi-Square goodness-of-fit test.

>>> from scipy.stats import chisquare
>>> chisquare(f_obs=[16, 18, 16, 14, 12, 12], f_exp=[16, 16, 16, 16, 16, 8])
(3.5, 0.62338762774958223)

To come up with the expected distribution you can make use of:

>>> from scipy.stats import binom, poisson

>>> n, p = 5, 0.4
>>> mean, var, skew, kurt = binom.stats(n, p, moments='mvsk')
>>> f_binom_exp = binom.pmf(range(n + 1), n, p)

>>> mu = 0.6
>>> mean, var, skew, kurt = poisson.stats(mu, moments='mvsk')
>>> f_poisson_exp = poisson.pmf(range(n + 1), p)

In case you want to select a distribution that fits your data as well as possible you can try to optimize the goodness-of-fit by playing with the parameters of the distributions.

Your question is not entirely clear to me so I'm afraid I cannot be of further assistance at the moment, but I think the most import utilities are described here at least. Good luck!