Why do I need to pass the previous nummber of channels to the batchnorm? The batchnorm should normalize over each datapoint in the batch, why does it need to have the number of channels then ?
1 Answers
Batch normalisation has learnable parameters, because it includes an affine transformation.
From the documentation of nn.BatchNorm2d
:
The mean and standard-deviation are calculated per-dimension over the mini-batches and γ and β are learnable parameter vectors of size C (where C is the input size). By default, the elements of γ are set to 1 and the elements of β are set to 0.
Since the norm is calculated per channel, the parameters γ and β are vectors of size num_channels (one element per channel), which results in an individual scale and shift per channel. As with any other learnable parameter in PyTorch, they need to be created with a fixed size, hence you need to specify the number of channels
batch_norm = nn.BatchNorm2d(10)
# γ
batch_norm.weight.size()
# => torch.Size([10])
# β
batch_norm.bias.size()
# => torch.Size([10])
Note: Setting affine=False
does not use any parameters and the number of channels wouldn't be needed, but they are still required, in order to have a consistent interface.

- 31,583
- 3
- 91
- 84
-
Ahh okay! Thank you for this well explained enlightment :D – TheBenimeni May 27 '20 at 13:22
-
Just a tiny addition: If you did set affine=False you would actually still need the number of channels since they are also used to initialize buffers for storing the running stats. That can be seen in the code here: https://pytorch.org/docs/stable/_modules/torch/nn/modules/batchnorm.html#_NormBase Now, if you *also* set track_running_stats=False, then I agree that you would not need the number of channels as a parameter. – Leon Z. Apr 13 '23 at 15:36