I have a Mathematica code that calculates the 95% confidence intervals of a Cumulative Distribution Function (CDF) obtained from a specific Probability Distribution Function (PDF). The PDF is ugly, as it contains an Hypergeometric 2F1 function, and I need to calculate the 2-sigma errorbars of a data set of 15 values.
I want to translate this code to Python, but I get a very significant divergence on the second half of the values.
Mathematica code
results
are the lower and upper 2-sigma confidence level for the values in xdata
. That is, xdata
should always fall between the two corresponding results
values.
navs = {10, 10, 18, 30, 52, 87, 147, 245, 410, 684, 1141, 1903, 3173, 5290, 8816};
freqs = {0.00002, 0.00004, 0.0000666667, 0.000111111, 0.000185185, 0.000308642, 0.000514403, 0.000857339, 0.00142893, 0.00238166, 0.00396944, 0.00661594, 0.0165426, 0.0220568, 0.027571}
xdata = {0.578064980346793, 0.030812200935204, 0.316777979844816,
0.353718150091612, 0.287659600326548, 0.269254388840293,
0.16545714457921, 0.138759871084825, 0.0602382519940077,
0.10120771961, 0.065311134782518, 0.105235790998594,
0.124642033979457, 0.0271909963701794, 0.0686653810421847};
data = MapThread[{#1, #2, #3} &, {navs, freqs, xdata}]
post[x_, n_, y_] =
(n - 1) (1 - x)^n (1 - y)^(n - 2) Hypergeometric2F1[n, n, 1, x*y]
integral = Map[(values = #; mesh = Subdivide[0, 1, 1000];
Interpolation[
DeleteDuplicates[{Map[
SetPrecision[post[#, values[[1]], values[[3]]^2], 100] &,
mesh] // (Accumulate[#] - #/2 - #[[1]]/
2) & // #/#[[-1]] &,
mesh}\[Transpose], (#1[[1]] == #2[[1]] &)],
InterpolationOrder -> 1]) &, data];
results =
MapThread[{Sqrt[#1[.025]], Sqrt[#1[0.975]]} &, {integral, data}]
{{0.207919, 0.776508}, {0.0481485, 0.535278}, {0.0834002, 0.574447},
{0.137742, 0.551035}, {0.121376, 0.455097}, {0.136889, 0.403306},
{0.0674029, 0.279408}, {0.0612534, 0.228762}, {0.0158357, 0.134521},
{0.0525374, 0.156055}, {0.0270589, 0.108861}, {0.0740978, 0.137691},
{0.100498, 0.149646}, {0.00741129, 0.0525161}, {0.0507748, 0.0850961}}
Python code
Here's my translation: results
are the same quantity as before, truncated to the 7th digit to increase readability.
The results
values I get start diverging from the 7th pair of values on, and the last four points of xdata
do not fall between the two corresponding results
values.
import numpy as np
from scipy.integrate import cumtrapz
from scipy.interpolate import interp1d
from mpmath import *
mesh = list(np.linspace(0,1,1000));
navs = [10, 10, 18, 30, 52, 87, 147, 245, 410, 684, 1141, 1903, 3173, 5290, 8816]
freqs = [0.00002, 0.00004, 0.0000666667, 0.000111111, 0.000185185, 0.000308642, 0.000514403, 0.000857339, 0.00142893, 0.00238166, 0.00396944, 0.00661594, 0.0165426, 0.0220568, 0.027571]
xdata = [0.578064980346793, 0.030812200935204, 0.316777979844816,
0.353718150091612,0.287659600326548, 0.269254388840293,
0.16545714457921, 0.138759871084825, 0.0602382519940077,
0.10120771961, 0.065311134782518, 0.105235790998594,
0.124642033979457, 0.0271909963701794, 0.0686653810421847]
def post(x,n,y):
post = (n-1)*((1-x)**n)*((1-y)**(n-2))*hyp2f1(n,n,1,x*y)
return post
# setting the numeric precision to 100 as in Mathematica
# trying to get the most precise hypergeometric function values
mp.dps = 100
mp.pretty = True
results = []
for i in range(len(navs)):
postprob = [];
for j in range(len(mesh)):
posterior = post(mesh[j], navs[i], xdata[i]**2)
postprob.append(posterior)
# calculate the norm of the pdf for integration
norm = np.trapz(np.array(postprob),mesh);
# integrate pdf/norm to obtain cdf
integrate = list(np.unique(cumtrapz(np.array(postprob)/norm, mesh, initial=0)));
mesh2 = list(np.linspace(0,1,len(integrate)));
# interpolate inverse cdf to obtain the 2sigma quantiles
icdf = interp1d(integrate, mesh2, bounds_error=False, fill_value='extrapolate');
results.append(list(np.sqrt(icdf([0.025, 0.975]))))
results
[[0.2079198, 0.7765088], [0.0481485, 0.5352773], [0.0834, 0.5744489],
[0.1377413, 0.5510352], [0.1218029, 0.4566994], [0.1399324, 0.4122767],
[0.0733743, 0.3041607], [0.0739691, 0.2762597], [0.0230135, 0.1954886],
[0.0871462, 0.2588804], [0.05637, 0.2268962], [0.1731199, 0.3217401],
[0.2665897, 0.3969059], [0.0315915, 0.2238736], [0.2224567, 0.3728803]]
Thanks to the comments to this question, I found out that:
- The hypergeometric function gives different results in the two languages. With the same input values i get that: In Mathematica
Hypergeometric2F1
gives me as a result1.0588267
, while in Pythonmpmath.hyp2f1
gives1.0588866
. This is the very second point of the mesh, and the difference in in the fifth decimal place.
Is there somewhere a better definition of this special function I was not able to find?
- I still don't know if this is only due to the Hypergeometric function or also to the integration method, but that is definitely a starting point.
(I am fairly new to Python, maybe the code is a bit naive)