Background
I am working on computing a series of best-fit gamma curves for a 2-D dataset in Numpy (ndarray), a prior question for the genesis of this can be found here.
Scipy was previously utilized (scipy.gamma.stats), however this library is not optimized for multi-dimensional arrays and a barebones function was written to meet the objective. I've successfully fit (albeit not as cleanly as Scipy) a curve to the dataset, which is provided below.
Current Issue
I want to obtain the percentile of a given value and vice versa along the calculated gamma distribution. However, I'm not obtaining expected values off the fitted curve. For example, providing the 50th percentile yields a value of 4.471, which does not match up with the curve fit shown below. What modifications or wholesale alterations can be made to yield both percentiles and values from supplied data?
Graph
Code
import sys, os, math
import numpy as np
import scipy as sci
import matplotlib.pyplot as plt
data = np.array([0.00, 0.00, 11.26399994, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 17.06399918, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 8.33279991, 0.00, 7.54879951, 0.00, 0.00, 0.00, 4.58799982, 7.9776001, 0.00, 0.00, 0.00, 0.00, 11.45040035, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 18.73279953, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 8.94559956, 0.00, 7.73040009, 0.00, 0.00, 0.00, 5.03599977, 8.62639999, 0.00, 0.00, 0.00, 0.00, 11.11680031, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 14.37839985, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 8.16479969, 0.00, 7.30719948, 0.00, 0.00, 0.00, 3.41039991, 7.17280006, 0.00, 0.00, 0.00, 0.00, 10.0099199963, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 13.97839928, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 7.6855998, 0.00, 6.86559963, 0.00, 0.00, 0.00, 3.21600008, 7.93599987, 0.00, 0.00, 0.00, 0.00, 11.55999947, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 18.76399994, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 10.0033039951, 0.00, 8.10639954, 0.00, 0.00, 0.00, 4.76480007, 6.87679958, 0.00, 0.00, 0.00, 0.00, 11.42239952, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 19.42639732, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 10.0052400017, 0.00, 8.2567997, 0.00, 0.00, 0.00, 5.08239985, 7.9776001, 0.00, 0.00, 0.00, 0.00, 10.0099839973, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 11.5855999, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 7.88399982, 0.00, 5.96799994, 0.00, 0.00, 0.00, 3.07679987, 7.81360006, 0.00, 0.00, 0.00, 0.00, 11.51119995, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 20.0030959892, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 10.0050879955, 0.00, 8.20479965, 0.00, 0.00, 0.00, 5.51599979, 9.02879906, 0.00, 0.00])
def scigamma(data):
param = sci.stats.gamma.fit(data)
x = np.linspace(0, np.max(data), 250)
cdf = sci.stats.gamma.cdf(x, *param)
value = np.round((sci.stats.gamma.cdf(0.10, *param) * 100), 2)
percentile = np.round((sci.stats.gamma.ppf(50.00, *param) * 100), 2)
return cdf
scicdf = scigamma(data)
# Method of Moments estimation
mean = np.mean(data)
variance = np.var(data)
alpha = mean**2 / variance
beta = variance / mean
# Generate x-axis values for the curves
x = np.linspace(0, np.max(data), 250)
# Calculate the gamma distribution PDF values
pdf = (x ** (alpha - 1) * np.exp(-x / beta)) / (beta ** alpha * np.math.gamma(alpha))
# Calculate the gamma distribution CDF values
cdf = np.zeros_like(x)
cdf[x > 0] = np.cumsum(pdf[x > 0]) / np.sum(pdf[x > 0])
# Estimate the probability of zero values
num_zeros = np.count_nonzero(data == 0)
zero_probability = np.count_nonzero(data == 0) / len(data)
# Calculate the PDF and CDF values at zero
pdf_zero = zero_probability / (beta ** alpha * np.math.gamma(alpha))
cdf_zero = zero_probability
value = 2.50
percentile = 0.50
index = np.argmax(pdf >= value)
# Calculate the percentile using numerical integration
pct = np.trapz(pdf[:index+1], dx=1) + (value - pdf[index]) * (cdf[index] - cdf[index-1]) / (pdf[index-1] - pdf[index])
index = np.argmax(cdf >= percentile)
# Calculate the value using numerical integration
val = np.trapz(cdf[:index+1], dx=1) + (percentile - cdf[index-1]) * (pdf[index] - pdf[index-1]) / (cdf[index] - cdf[index-1])
# Plot the data histogram
plt.hist(data, bins=30, density=True, alpha=0.5, label='data')
# Plot the gamma distribution CDF curve
plt.plot(x, cdf, 'b', label='Gamma CDF | Custom Fit')
plt.plot(x, scicdf, 'k', label='Gamma CDF | SciPy Fit')
# Set plot labels and legend
plt.xlabel('data')
plt.ylabel('Probability')
plt.legend()