Usually when we do nn.Layernorm, we want to find the norm across that whole dimension, otherwise I think it would be called group norm. Why in the pytorch documents, they use LayerNorm like this?
>>> # Image Example
>>> N, C, H, W = 20, 5, 10, 10
>>> input = torch.randn(N, C, H, W)
>>> # Normalize over the last three dimensions (i.e. the channel and spatial dimensions)
>>> # as shown in the image below
>>> layer_norm = nn.LayerNorm([C, H, W])
>>> output = layer_norm(input)
What happens if I do not know the dimension size (C,H,W)? How do I define a LayerNorm layer?