9

How to do histogram equalization for multiple grayscaled images stored in a NumPy array easily?

I have the 96x96 pixel NumPy data in this 4D format:

(1800, 1, 96,96)
NoDataDumpNoContribution
  • 10,591
  • 9
  • 64
  • 104
pbu
  • 2,982
  • 8
  • 44
  • 68
  • 2
    Does this help: https://web.archive.org/web/20151219221513/http://www.janeriksolem.net/2009/06/histogram-equalization-with-python-and.html ? – Martin Thoma Feb 14 '15 at 18:25
  • Thanks moose. That helps ofcourse with PIL. Are there any pure numpy solution?? as it could be faster as i am doing it for 10,000 images. PIL and Skimage usually takes a long time. – pbu Feb 14 '15 at 19:59
  • 1
    @pbu PIL is needed only for reading of the image data in the example. Since you have the data already you don't need it, only NumPy. See the code I have written in the anser. Also many thanks to moose for linking the solution! – NoDataDumpNoContribution Feb 14 '15 at 21:22

4 Answers4

18

Moose's comment which points to this blog entry does the job quite nicely.

For completeness, I give an example here using nicer variable names and a looped execution on 1000 96x96 images which are in a 4D array as in the question. It is fast (1-2 seconds on my computer) and only needs NumPy.

import numpy as np

def image_histogram_equalization(image, number_bins=256):
    # from http://www.janeriksolem.net/histogram-equalization-with-python-and.html

    # get image histogram
    image_histogram, bins = np.histogram(image.flatten(), number_bins, density=True)
    cdf = image_histogram.cumsum() # cumulative distribution function
    cdf = (number_bins-1) * cdf / cdf[-1] # normalize

    # use linear interpolation of cdf to find new pixel values
    image_equalized = np.interp(image.flatten(), bins[:-1], cdf)

    return image_equalized.reshape(image.shape), cdf

if __name__ == '__main__':

    # generate some test data with shape 1000, 1, 96, 96
    data = np.random.rand(1000, 1, 96, 96)

    # loop over them
    data_equalized = np.zeros(data.shape)
    for i in range(data.shape[0]):
        image = data[i, 0, :, :]
        data_equalized[i, 0, :, :] = image_histogram_equalization(image)[0]
NoDataDumpNoContribution
  • 10,591
  • 9
  • 64
  • 104
  • 1
    For those who are wondering, normalized-histogram of an image which is histogram of the image divided by total number of pixels in image, can be thought of as the probability density function of each gray level, and that's exactly what `density=True` does. so here `image_histogram` is actualy the normalized-histogram. – mh sattarian Apr 06 '19 at 22:19
  • 1
    This is fantastic, thank you for the copy-pastable code. Exactly what I needed. Also +1 because this works with N-dimensional arrays, not just images. – Nick Crews Jan 20 '22 at 20:49
  • 1
    Shouldn't the 255 in `cdf = 255 * cdf / cdf[-1]` be `(number_bins - 1)` instead? As it is, it's a bit inconsistent. – Dominik Stańczak Aug 20 '22 at 16:15
  • @DominikStańczak0 Thank you for that observation. You're right and I edited the code. – NoDataDumpNoContribution Aug 22 '22 at 12:49
4

Very fast and easy way is to use the cumulative distribution function provided by the skimage module. Basically what you do mathematically to proof it.

from skimage import exposure
import numpy as np
def histogram_equalize(img):
    img = rgb2gray(img)
    img_cdf, bin_centers = exposure.cumulative_distribution(img)
    return np.interp(img, bin_centers, img_cdf)
Dodo
  • 137
  • 5
1

As of today janeriksolem's url is broken.

I found however this gist that links the same page and claims to perform histogram equalization without computing the histogram.

The code is:

img_eq = np.sort(img.ravel()).searchsorted(img)
Diego Ferri
  • 2,657
  • 2
  • 27
  • 35
1

Here's an alternate implementation for a single channel image that is fast. See skimage.exposure.histogram for reference. Using timeit, 'image_histogram_equalization' in Trilarion's answer has a mean execution time was 0.3696 seconds, while this function has a mean execution time of 0.0534 seconds. However this implementation also relies on skimage.

import numpy as np
from skimage import exposure

def hist_eq(image):
    hist, bins = exposure.histogram(image, nbins=256, normalize=False)
    # append any remaining 0 values to the histogram
    hist = np.hstack((hist, np.zeros((255 - bins[-1])))) 
    cdf = 255*(hist/hist.sum()).cumsum()
    equalized = cdf[image].astype(np.uint8)

    return equalized