How does batch normalisation affect vanilla saliency (or other model interpretation) results?

Question

I have built a convolutional neural network for a binary classification problem, and am now interpreting my results. I'm creating saliency plots to determine regions of the image that are important to the results. However, I find that the saliency maps are very different depending on whether I include batch normalisation or not.

Intuitively it seems that when batch normalisation in included the maps trace regions of the image that are less important to the prediction, although the performance of the models with and without batch normalisation is similar.

Does anyone know why batch norm would change the results of the saliency maps or if there's anything that should be considered separately when computing the gradients of inputs with respect to positive class when batch norm is included?

I have tried both including and excluding batch norm in the model, as well as a normalisation layer within the network and normalising outside of the model. The normalisation layer has no effect, but the batch norm does.

Thanks!

How does batch normalisation affect vanilla saliency (or other model interpretation) results?

0 Answers0