How does one calculate the empirical cumulative distribution (ecdf) of an image in Python?

Question

I am trying to calculate the empirical cumulative distribution of images in Python. What is the best practice in doing so? And also I need the result to be stored in an array so that I can use it in further steps of my analysis.

I am using this function and I am not sure if it is the right way to do it:

`def ecdf(data):
    x = np.sort(data.flatten())
    n = x.size
    y = np.arange(1, n+1) / n
    return (x,y)`

The goal is to have the cumulative distribution of intensities in the pixels, right? Are the images colored? — gtancev, Aug 11 '20 at 03:51
Yes, the cumulative distribution of the pixel intensities. The images are grayscale. Thanks — Amir Charkhi, Aug 11 '20 at 05:05
I believe [numpy.histogram](https://numpy.org/doc/stable/reference/generated/numpy.histogram.html) does what you're looking for. — Big Bro, Aug 11 '20 at 05:38
Well is you do `np.histogram(data.flatten())`, you get a tuple `values, bins` representing your distribution in you img. If you want the cumulative value, you can do `cum_values = np.cumsum(values)`. Then you can plot it for example: `plt.plot(bins, cum_values)` which should give you a nice distribution graph. If it's not what you're looking for, can you be a little more precise about what you want to do with the ECDF ? — Big Bro, Aug 11 '20 at 07:25
@BigBro thanks mate, this assures me I am on the right path. — Amir Charkhi, Aug 12 '20 at 01:43

Amir Charkhi · Accepted Answer · 2021-11-30T07:05:17.407

Here is how I am doing this now and it works (for a grayscale image):

For normal Gaussian distribution:

def Ghist(image):
    '''compute eCDF of an image'''
    data_flatten = image.flatten()
    data_sort = np.sort(data_flatten)
    values, bins = np.histogram(data_sort, normed=True)

    return (bins, values)

For cumulative distribution:

def ecdf(image):
    '''compute eCDF of an image'''
    data_flatten = image.flatten()
    data_sort = np.sort(data_flatten)
    values, bins = np.histogram(data_sort, normed=True)
    data_cum = np.cumsum(values)
    
    return (bins, data_cum)

How does one calculate the empirical cumulative distribution (ecdf) of an image in Python?

1 Answers1