2

I'm trying to recreate the following integral with empirical data:

enter image description here

where F, G are cdfs and their inverses are quantile functions.

Here's my code:

def eqces(u,v):
    import numpy as np
    import statsmodels.api as sm
    from scipy.stats.mstats import mquantiles

    ecdfu = sm.distributions.ECDF(u)
    ecdfv = sm.distributions.ECDF(v)
    p = np.concatenate([ecdfu.y, ecdfv.y])
    p = np.unique(p) 
    p.sort()

    qfu = mquantiles(u, p)
    qfv = mquantiles(v, p)

    uvinv = ecdfu(qfv)
    vuinv = ecdfv(qfu)

    result = abs(uvinv - p) + abs(vuinv - p)
    return np.dot(result, np.ones(p.size))

With this I would expect that eqces(u,u) = 0 for u = np.random.uniform(0,1,50) but this is generally not the case. Can anyone tell if i'm doing something wrong or suggest alternatives?

Edit

This code seems to work better with some analytical results:

def eqces(u,v): 
    ecdfu = sm.distributions.ECDF(u)
    ecdfv = sm.distributions.ECDF(v)

    p = np.concatenate([ecdfu.y, ecdfv.y])
    X = np.concatenate([ecdfu.x, ecdfv.x])

    return 2*np.dot(np.abs(ecdfu(X)-p)+np.abs(ecdfv(X)-p), np.ones(p.size))/p.size

1 Answers1

1

My guess is that ECDF and mquantiles don't use the same plotting positions

mquantiles has the optional keywords alphap=0.4, betap=0.4.

p and uvinv will not round-trip in this case.

However, in large sample the difference should be small.

scipy.stats.ks_2samp is doing something similar, but working directly with numpy without helper functions.

BTW: Does this distance measure between the two distributions have a name?

Josef
  • 21,998
  • 3
  • 54
  • 67
  • I found it referenced as "Quantile Comparison Effect Size," a metric for the difference between two non normal, non parametric (arbitrary) distributions. – alanlujan91 Oct 17 '13 at 14:10