I'm trying to copy pre-trained BN weights from a pytorch model to its equivalent Keras model but I keep getting different outputs.
I read Keras and Pytorch BN documentation and I think that the difference lies in the way they calculate the "mean" and "var".
Pytorch:
The mean and standard-deviation are calculated per-dimension over the mini-batches
source: Pytorch BatchNorm
Thus, they average over samples.
Keras:
axis: Integer, the axis that should be normalized (typically the features axis). For instance, after a Conv2D layer with data_format="channels_first", set axis=1 in BatchNormalization.
source: Keras BatchNorm
and here they average over the features (channels)
What's the right way? How to transfer BN weights between the models?