2

I have a Mathematica code that calculates the 95% confidence intervals of a Cumulative Distribution Function (CDF) obtained from a specific Probability Distribution Function (PDF). The PDF is ugly, as it contains an Hypergeometric 2F1 function, and I need to calculate the 2-sigma errorbars of a data set of 15 values.

I want to translate this code to Python, but I get a very significant divergence on the second half of the values.

Mathematica code

results are the lower and upper 2-sigma confidence level for the values in xdata. That is, xdata should always fall between the two corresponding results values.

navs  = {10, 10, 18, 30, 52, 87, 147, 245, 410, 684, 1141, 1903, 3173,  5290, 8816};
freqs = {0.00002, 0.00004, 0.0000666667, 0.000111111, 0.000185185,   0.000308642, 0.000514403, 0.000857339, 0.00142893, 0.00238166,   0.00396944, 0.00661594, 0.0165426, 0.0220568, 0.027571}
xdata = {0.578064980346793,   0.030812200935204,   0.316777979844816,  
         0.353718150091612,   0.287659600326548,   0.269254388840293,  
         0.16545714457921,   0.138759871084825,    0.0602382519940077,   
         0.10120771961,  0.065311134782518,    0.105235790998594,   
         0.124642033979457,   0.0271909963701794,  0.0686653810421847};
data = MapThread[{#1, #2, #3} &, {navs, freqs, xdata}]

post[x_, n_, y_] = 
     (n - 1) (1 - x)^n (1 - y)^(n - 2) Hypergeometric2F1[n, n, 1, x*y]

integral = Map[(values = #; mesh = Subdivide[0, 1, 1000]; 
     Interpolation[
      DeleteDuplicates[{Map[
            SetPrecision[post[#, values[[1]], values[[3]]^2], 100] &, 
            mesh] // (Accumulate[#] - #/2 - #[[1]]/
               2) & // #/#[[-1]] &, 
         mesh}\[Transpose], (#1[[1]] == #2[[1]] &)], 
      InterpolationOrder -> 1]) &, data];

results = 
 MapThread[{Sqrt[#1[.025]], Sqrt[#1[0.975]]} &, {integral, data}]

{{0.207919, 0.776508}, {0.0481485, 0.535278}, {0.0834002, 0.574447}, 
{0.137742, 0.551035}, {0.121376, 0.455097}, {0.136889, 0.403306}, 
{0.0674029, 0.279408}, {0.0612534, 0.228762}, {0.0158357, 0.134521}, 
{0.0525374, 0.156055}, {0.0270589, 0.108861}, {0.0740978, 0.137691}, 
{0.100498, 0.149646}, {0.00741129, 0.0525161}, {0.0507748, 0.0850961}}

Python code

Here's my translation: results are the same quantity as before, truncated to the 7th digit to increase readability.

The results values I get start diverging from the 7th pair of values on, and the last four points of xdata do not fall between the two corresponding results values.

import numpy as np
from scipy.integrate import cumtrapz
from scipy.interpolate import interp1d
from mpmath import *

mesh = list(np.linspace(0,1,1000));

navs = [10, 10, 18, 30, 52, 87, 147, 245, 410, 684, 1141, 1903, 3173, 5290, 8816]
freqs = [0.00002, 0.00004, 0.0000666667, 0.000111111, 0.000185185, 0.000308642, 0.000514403, 0.000857339, 0.00142893, 0.00238166, 0.00396944, 0.00661594, 0.0165426, 0.0220568, 0.027571]
xdata = [0.578064980346793, 0.030812200935204, 0.316777979844816, 
0.353718150091612,0.287659600326548, 0.269254388840293,
0.16545714457921, 0.138759871084825, 0.0602382519940077, 
0.10120771961, 0.065311134782518, 0.105235790998594, 
0.124642033979457, 0.0271909963701794, 0.0686653810421847]

def post(x,n,y):
    post = (n-1)*((1-x)**n)*((1-y)**(n-2))*hyp2f1(n,n,1,x*y)
    return post

# setting the numeric precision to 100 as in Mathematica
# trying to get the most precise hypergeometric function values
mp.dps = 100
mp.pretty = True

results = []

for i in range(len(navs)):
    postprob = [];
    for j in range(len(mesh)):    
        posterior = post(mesh[j], navs[i], xdata[i]**2)
        postprob.append(posterior)
# calculate the norm of the pdf for integration
    norm = np.trapz(np.array(postprob),mesh);
# integrate pdf/norm to obtain cdf
    integrate = list(np.unique(cumtrapz(np.array(postprob)/norm, mesh, initial=0)));
    mesh2 = list(np.linspace(0,1,len(integrate)));
# interpolate inverse cdf to obtain the 2sigma quantiles
    icdf = interp1d(integrate, mesh2, bounds_error=False, fill_value='extrapolate');
    results.append(list(np.sqrt(icdf([0.025, 0.975]))))

results

[[0.2079198, 0.7765088], [0.0481485, 0.5352773], [0.0834, 0.5744489],
 [0.1377413, 0.5510352], [0.1218029, 0.4566994], [0.1399324, 0.4122767],
 [0.0733743, 0.3041607], [0.0739691, 0.2762597], [0.0230135, 0.1954886],
 [0.0871462, 0.2588804], [0.05637, 0.2268962],   [0.1731199, 0.3217401],
 [0.2665897, 0.3969059], [0.0315915, 0.2238736], [0.2224567, 0.3728803]]

Thanks to the comments to this question, I found out that:

  • The hypergeometric function gives different results in the two languages. With the same input values i get that: In Mathematica Hypergeometric2F1 gives me as a result 1.0588267, while in Python mpmath.hyp2f1 gives 1.0588866. This is the very second point of the mesh, and the difference in in the fifth decimal place.

Is there somewhere a better definition of this special function I was not able to find?

  • I still don't know if this is only due to the Hypergeometric function or also to the integration method, but that is definitely a starting point.

(I am fairly new to Python, maybe the code is a bit naive)

  • 1
    Could you think of one or more related integrals that you could perform both in Mathematica and in Python that are related to your problem and that you know exactly what the result should be? Could one or more of these eliminate some of the possibilities for the error that you are seeing or lead you to further hypotheses that might help you identify the source of the problem? – Bill Mar 25 '19 at 21:16
  • @Bill I did some double check on the values i get both in Mathematica and Python for the hypergeometric function, and they diverge right from the start. It looks like the problem is **at least** in the different definition of this special function. E.g. in Mathematica I get Hypergeometric2F1[10,10,1,xdata[[1]]/1000] = 1.0588267, while in Python with the same values I get hyp2f1[10,10,1,xdata[0]/1000] = 1.0588866. –  Apr 01 '19 at 12:20

0 Answers0