0

I want to find the probability distribution of two images so I can calculate KL Divergence.

I'm trying to figure out what probability distribution means in this sense. I've converted my images to grayscale, flattened them to a 1d array and plotted them as a histogram with bins = 256

imageone = imgGray.flatten() # array([0.64991451, 0.65775765, 0.66560078, ..., 
imagetwo = imgGray2.flatten()

plt.hist(imageone, bins=256, label = 'image one') 
plt.hist(imagetwo, bins=256, alpha = 0.5, label = 'image two')
plt.legend(loc='upper left')

My next step is to call the ks_2samp function from scikit to calculate the divergence, but I'm unclear what arguments to use.

A previous answer explained that we should take the "take the histogram of the image(in gray scale) and than divide the histogram values by the total number of pixels in the image. This will result in the probability to find a gray value in the image."

Ref: Can Kullback-Leibler be applied to compare two images?

But what do we mean by take the histogram values? How do I 'take' these values?

Might be overcomplicating things, but confused by this.

Jean-Paul Azzopardi
  • 401
  • 1
  • 2
  • 10
  • Maybe you can explain more about what is the larger problem you are trying to solve. Working with histograms has the implication that all images which have the same histogram are identical -- depending on the goal you are working towards, that might or might not be desirable. – Robert Dodier Feb 06 '23 at 19:30
  • Just comparing two similar images, got a great answer from Matt Pitkin! – Jean-Paul Azzopardi Feb 06 '23 at 20:29

1 Answers1

2

The hist function will return 3 values, the first of which is the values (i.e., number counts) in each histogram bin. If you pass the density=True argument to hist, these values will be the probability density in each bin. I.e.,:

prob1, _, _ = plt.hist(imageone, bins=256, density=True, label = 'image one') 
prob2, _, _ = plt.hist(imagetwo, bins=256, density=True, alpha = 0.5, label = 'image two')

You can then calculate the KL divergence using the scipy entropy function:

from scipy.stats import entropy

entropy(prob1, prob2)
Matt Pitkin
  • 3,989
  • 1
  • 18
  • 32