Normalize in pytorch context subtracts from each instance (MNIST image in your case) the mean (the first number) and divides by the standard deviation (second number). This takes place for each channel separately, meaning in mnist you only need 2 numbers because images are grayscale, but on let's say cifar10 which has colored images you would use something along the lines of your last sform (3 numbers for mean and 3 for std).
So basically each input image in MNIST gets transformed from [0,255] to [0,1] because you transform an image to Tensor (source: https://pytorch.org/docs/stable/torchvision/transforms.html --
Converts a PIL Image or numpy.ndarray (H x W x C) in the range [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0] if the PIL Image belongs to one of the modes (L, LA, P, I, F, RGB, YCbCr, RGBA, CMYK, 1) or if the numpy.ndarray has dtype = np.uint8)
After that you want your input image to have values in a range like [0,1] or [-1,1] to help your model converge to the right direction (Many reasons why scaling takes place, e.g. NNs prefer inputs around that range to avoid gradient saturation). Now as you probably noticed passing 0.5 and 0.5 in Normalize would yield vales in range:
Min of input image = 0 -> 0-0.5 = -0.5 -> gets divided by 0.5 std -> -1
Max of input image = 255 -> toTensor -> 1 -> (1 - 0.5) / 0.5 -> 1
so it transforms your data in a range [-1, 1]