I have a few datasets with large size and they are discrete in nature. I want to fit that data in a few distribution functions, to understand the outlyers in that data. But I am not able to find a few variables like "p" value and variance of the data to implement. Is there a generic way to identify those variable values?
2 Answers
How about:
def poisson(k, lamb):
return (lamb ** k / factorial(k)) * np.exp(-lamb)
entries, bin_edges, patches = plt.hist(data_list, density=True, bins=100, range=[0, 50])
# calculate binmiddles
bin_middles = 0.5 * (bin_edges[1:] + bin_edges[:-1])
# fit with curve_fit
bin_middles_filtered = [bin_middles[i] for i in range(len(entries)) if entries[i] > 0.001]
parameters, cov = curve_fit(poisson, bin_middles, entries)
This provides you with a Poisson function and their parameters

- 858
- 1
- 11
- 36
I think you are looking for Chi-Square Goodness-of-fit test. It is able to test if a sample of data came from a population with a specific distribution and works for discrete distributions such as Binomial and Poisson. More information on how to perform this analysis in Python can also be found here: Performing a Chi-Square goodness-of-fit test.
>>> from scipy.stats import chisquare
>>> chisquare(f_obs=[16, 18, 16, 14, 12, 12], f_exp=[16, 16, 16, 16, 16, 8])
(3.5, 0.62338762774958223)
To come up with the expected distribution you can make use of:
>>> from scipy.stats import binom, poisson
>>> n, p = 5, 0.4
>>> mean, var, skew, kurt = binom.stats(n, p, moments='mvsk')
>>> f_binom_exp = binom.pmf(range(n + 1), n, p)
>>> mu = 0.6
>>> mean, var, skew, kurt = poisson.stats(mu, moments='mvsk')
>>> f_poisson_exp = poisson.pmf(range(n + 1), p)
In case you want to select a distribution that fits your data as well as possible you can try to optimize the goodness-of-fit by playing with the parameters of the distributions.
Your question is not entirely clear to me so I'm afraid I cannot be of further assistance at the moment, but I think the most import utilities are described here at least. Good luck!

- 377
- 1
- 7