0

I am trying to calculate the empirical cumulative distribution of images in Python. What is the best practice in doing so? And also I need the result to be stored in an array so that I can use it in further steps of my analysis.

I am using this function and I am not sure if it is the right way to do it:

`def ecdf(data):
    x = np.sort(data.flatten())
    n = x.size
    y = np.arange(1, n+1) / n
    return (x,y)`
Taylor Reece
  • 486
  • 3
  • 14
Amir Charkhi
  • 768
  • 7
  • 23
  • The goal is to have the cumulative distribution of intensities in the pixels, right? Are the images colored? – gtancev Aug 11 '20 at 03:51
  • Yes, the cumulative distribution of the pixel intensities. The images are grayscale. Thanks – Amir Charkhi Aug 11 '20 at 05:05
  • I believe [numpy.histogram](https://numpy.org/doc/stable/reference/generated/numpy.histogram.html) does what you're looking for. – Big Bro Aug 11 '20 at 05:38
  • @BigBro any more info? – Amir Charkhi Aug 11 '20 at 06:36
  • Well is you do `np.histogram(data.flatten())`, you get a tuple `values, bins` representing your distribution in you img. If you want the cumulative value, you can do `cum_values = np.cumsum(values)`. Then you can plot it for example: `plt.plot(bins, cum_values)` which should give you a nice distribution graph. If it's not what you're looking for, can you be a little more precise about what you want to do with the ECDF ? – Big Bro Aug 11 '20 at 07:25
  • @BigBro thanks mate, this assures me I am on the right path. – Amir Charkhi Aug 12 '20 at 01:43

1 Answers1

0

Here is how I am doing this now and it works (for a grayscale image):

  1. For normal Gaussian distribution:
def Ghist(image):
    '''compute eCDF of an image'''
    data_flatten = image.flatten()
    data_sort = np.sort(data_flatten)
    values, bins = np.histogram(data_sort, normed=True)

    return (bins, values)
  1. For cumulative distribution:
def ecdf(image):
    '''compute eCDF of an image'''
    data_flatten = image.flatten()
    data_sort = np.sort(data_flatten)
    values, bins = np.histogram(data_sort, normed=True)
    data_cum = np.cumsum(values)
    
    return (bins, data_cum)
Amir Charkhi
  • 768
  • 7
  • 23