I am new to CNN and was implementing Batchnorm in CNN using keras. The Batch norm layer has 4*Feature_map(of prev layer) parameters. Which are as follows:
- 2 are gamma and beta
- The other 2 are for the exponential moving average of the mean and variance of mini-batches
Now, the exponential moving average of the mean and variance are defined as:
running_mean = momentum * running_mean + (1 - momentum) * sample_mean
running_var = momentum * running_var + (1 - momentum) * sample_var
In BatchNormalization function of keras I saw that there is just one hyperparameter named as momentum.
BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001, center=True, scale=True, beta_initializer='zeros', gamma_initializer='ones', moving_mean_initializer='zeros', moving_variance_initializer='ones', beta_regularizer=None, gamma_regularizer=None, beta_constraint=None, gamma_constraint=None, **kwargs)
My question is why there are not separate hyperparameters of momentum for running mean as well as running variance?