I want to normalize custom dataset of images. For that i need to compute mean and standard deviation by iterating over the dataset. How can I normalize my entire dataset before creating the data set?
Asked
Active
Viewed 3,865 times
2 Answers
1
What normalization tries to do is mantain the overall information on your dataset, even when there exists differences in the values, in the case of images it tries to set apart some issues like brightness and contrast that in certain case does not contribute to the general information that the image has. There are several ways to do this, each one with pros and cons, depending on the image set you have and the processing effort you want to do on them, just to name a few:
- Linear Histogram stetching: where you do a linear map on the current range of values in your image and stetch it to match the 0 and 255 values in RGB
- Nonlinear Histogram stetching: Where you use a nonlinear function to map the input pixels to a new image. Commonly used functions are logarithms and exponentials. My favorite function is the cumulative probability function of the original histogram, it works pretty well.
- Adaptive Histogram equalization: Where you do a linear histogram stretching in certain places of your image to avoid doing an identity mapping where you have the max range of values in your original image.

SalvadorViramontes
- 540
- 4
- 18
1
Well, let's take this image as an example:
The first thing you need to do is decide which library you want to use: Pillow or OpenCV. In this example I'll use Pillow:
from PIL import Image
import numpy as np
img = Image.open("test.jpg")
pix = np.asarray(img.convert("RGB")) # Open the image as RGB
Rchan = pix[:,:,0] # Red color channel
Gchan = pix[:,:,1] # Green color channel
Bchan = pix[:,:,2] # Blue color channel
Rchan_mean = Rchan.mean()
Gchan_mean = Gchan.mean()
Bchan_mean = Bchan.mean()
Rchan_var = Rchan.var()
Gchan_var = Gchan.var()
Bchan_var = Bchan.var()
And the results are:
- Red Channel Mean: 134.80585625
- Red Channel Variance: 3211.35843945
- Green Channel Mean: 81.0884125
- Green Channel Variance: 1672.63200823
- Blue Channel Mean: 68.1831375
- Blue Channel Variance: 1166.20433566
Hope it helps for your needs.

SalvadorViramontes
- 540
- 4
- 18
-
Yes, but if you are going to normalize all the images, it must be done with it's own mean and variance – SalvadorViramontes Nov 30 '18 at 00:16
-
Are you sure? Why is that aren't we suppose to find global mean and std and then normalize it? – Sherlock Nov 30 '18 at 02:17
-
The purpose of normalization is to have an image with mean and variance equal to 0 and 1, respectively. This is made to approach each image to a normal distribution by subtracting the mean value to each pixel and dividing the whole result by the standard deviation. – SalvadorViramontes Nov 30 '18 at 15:15
-
@Nightmerker please have a look at this https://stackoverflow.com/questions/60101240/finding-mean-and-standard-deviation-across-image-channels-pytorch , If it's normalized with it's own mean and standard deviation, then why is summed across the entire data and taking mean and std? – Amit JS May 14 '20 at 17:35