10

I am following some tutorials and I keep seeing different numbers that seem quite arbitrary to me in the transforms section

namely,

transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])

or

transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])

or

transform = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])

or others.

I wonder where these numbers arise, and how to know to select the correct ones?

I am about to use MNIST for sanity, but very soon to use my own unique dataset and will probably need my own normaliztion.

Gulzar
  • 23,452
  • 27
  • 113
  • 201

2 Answers2

14

Normalize in pytorch context subtracts from each instance (MNIST image in your case) the mean (the first number) and divides by the standard deviation (second number). This takes place for each channel separately, meaning in mnist you only need 2 numbers because images are grayscale, but on let's say cifar10 which has colored images you would use something along the lines of your last sform (3 numbers for mean and 3 for std).

So basically each input image in MNIST gets transformed from [0,255] to [0,1] because you transform an image to Tensor (source: https://pytorch.org/docs/stable/torchvision/transforms.html -- Converts a PIL Image or numpy.ndarray (H x W x C) in the range [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0] if the PIL Image belongs to one of the modes (L, LA, P, I, F, RGB, YCbCr, RGBA, CMYK, 1) or if the numpy.ndarray has dtype = np.uint8)

After that you want your input image to have values in a range like [0,1] or [-1,1] to help your model converge to the right direction (Many reasons why scaling takes place, e.g. NNs prefer inputs around that range to avoid gradient saturation). Now as you probably noticed passing 0.5 and 0.5 in Normalize would yield vales in range:

Min of input image = 0 -> 0-0.5 = -0.5 -> gets divided by 0.5 std -> -1

Max of input image = 255 -> toTensor -> 1 -> (1 - 0.5) / 0.5 -> 1

so it transforms your data in a range [-1, 1]

Gaussian Prior
  • 756
  • 6
  • 16
  • 1
    thanks :) This still doesn't explain the other "magic" numbers. What is `((0.1307,), (0.3081,))` for example? – Gulzar Dec 27 '20 at 16:18
  • Sorry. It is explained here that it is the mean and std of MNIST dataset (train) https://discuss.pytorch.org/t/normalization-in-the-mnist-example/457 -- so when you want to scale your data to 0 mean and 1 std you have to calculate it yourself, but in MNIST has already been done – Gaussian Prior Dec 27 '20 at 16:31
  • 2
    Coming back to this, I still don't understand when I should use which numbers. If the std and mean are `(0.1307,), (0.3081,)`, why use anything else? The 0.5, 0.5 example is mnist as well – Gulzar Mar 24 '22 at 14:04
  • Also, what about updating datasets? Do I have to preprocess these numbers every time I update my dataset, before training? – Gulzar Mar 24 '22 at 14:07
  • If you want to apply minmax scaling to an image dataset then 0.5 0.5 would do, otherwise applying the mean and std calculator from this https://pytorch.org/vision/stable/models.html before training should do it, even for updating datasets. In general there's no right answer you may go for MinMaxScaling or you may go for StandardScaling (with (0.13, 0.30))depends on the problem I'd say since these are 2 different preprocessing techniques. – Gaussian Prior Mar 25 '22 at 14:05
  • The reason I have not accepted the answer is it doesn't contain the essence, which is in my opinion what we talked about in the comments, and too briefly. – Gulzar Apr 01 '23 at 09:57
  • Accepting it makes no difference to me. Also, I'm probably super bad at conveying information to other people, but in any case the point of the answer is: 1) NNs want small numbers around [-1, 1] for many reasons 2) Many datasets (MNIST) have features with large numbers (124, 251, 255) 3) You need a way to convert large numbers to say [-1, 1] 4) say you have feature_1, feature_2 | target subtracting the mean of feature_1 from every instance and div by std will convert feature_1 values to numbers around 0. Same for feature_2. You may decide to do something else but this is the most common – Gaussian Prior Apr 03 '23 at 20:15
  • 5) to do that you would have to calculate the mean and std of feature_1 and feature_2 from your dataset. Pytorch assumes you already did that and accepts this as an input to Normalize 6) Now, for inference with your NN, what do you do on your test set? You don't calculate mean and std from test because this would add noise to your inputs. If (mean, std) from test were different you would normalize the same feature_1 value, in 2 different ways, basically training it and then testing it with different inputs. This doesn't work. For your test set you use the mean, std from train – Gaussian Prior Apr 03 '23 at 20:23
  • 1
    For updating datasets, whenever you retrain your network, you calculate these numbers again. There's no limitation to that. Say mean_2, std_2. Then for testing you use mean_2 and std_2 and so on. If you want any other normalization you are free to do so. You just need to map everything reasonably to a small range around 0. If you need additional info, then I probably misunderstood some parts of your question and/or comments – Gaussian Prior Apr 03 '23 at 20:25
-2

These specific numbers

(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])

are taken from ImageNet dataset as models are usually pretrained on it

Oldman
  • 1
  • This doesn't explain anything. – Gulzar Apr 13 '23 at 15:45
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Apr 15 '23 at 10:06