Pytorch: Normalize Image data set

Question

I want to normalize custom dataset of images. For that i need to compute mean and standard deviation by iterating over the dataset. How can I normalize my entire dataset before creating the data set?

score 1 · Answer 1 · answered Nov 29 '18 at 16:34

What normalization tries to do is mantain the overall information on your dataset, even when there exists differences in the values, in the case of images it tries to set apart some issues like brightness and contrast that in certain case does not contribute to the general information that the image has. There are several ways to do this, each one with pros and cons, depending on the image set you have and the processing effort you want to do on them, just to name a few:

Linear Histogram stetching: where you do a linear map on the current range of values in your image and stetch it to match the 0 and 255 values in RGB
Nonlinear Histogram stetching: Where you use a nonlinear function to map the input pixels to a new image. Commonly used functions are logarithms and exponentials. My favorite function is the cumulative probability function of the original histogram, it works pretty well.
Adaptive Histogram equalization: Where you do a linear histogram stretching in certain places of your image to avoid doing an identity mapping where you have the max range of values in your original image.

score 1 · Accepted Answer · answered Nov 29 '18 at 23:22

1

Well, let's take this image as an example:

The first thing you need to do is decide which library you want to use: Pillow or OpenCV. In this example I'll use Pillow:

from PIL import Image
import numpy as np

img = Image.open("test.jpg")
pix = np.asarray(img.convert("RGB")) # Open the image as RGB

Rchan = pix[:,:,0]  # Red color channel
Gchan = pix[:,:,1]  # Green color channel
Bchan = pix[:,:,2]  # Blue color channel

Rchan_mean = Rchan.mean()
Gchan_mean = Gchan.mean()
Bchan_mean = Bchan.mean()

Rchan_var = Rchan.var()
Gchan_var = Gchan.var()
Bchan_var = Bchan.var()

And the results are:

Red Channel Mean: 134.80585625
Red Channel Variance: 3211.35843945
Green Channel Mean: 81.0884125
Green Channel Variance: 1672.63200823
Blue Channel Mean: 68.1831375
Blue Channel Variance: 1166.20433566

Hope it helps for your needs.

answered Nov 29 '18 at 23:22

SalvadorViramontes

540
4
18

Yes, but if you are going to normalize all the images, it must be done with it's own mean and variance – SalvadorViramontes Nov 30 '18 at 00:16
Are you sure? Why is that aren't we suppose to find global mean and std and then normalize it? – Sherlock Nov 30 '18 at 02:17
The purpose of normalization is to have an image with mean and variance equal to 0 and 1, respectively. This is made to approach each image to a normal distribution by subtracting the mean value to each pixel and dividing the whole result by the standard deviation. – SalvadorViramontes Nov 30 '18 at 15:15
@Nightmerker please have a look at this https://stackoverflow.com/questions/60101240/finding-mean-and-standard-deviation-across-image-channels-pytorch , If it's normalized with it's own mean and standard deviation, then why is summed across the entire data and taking mean and std? – Amit JS May 14 '20 at 17:35

Pytorch: Normalize Image data set

2 Answers2