How to calculate mutual information in PyTorch (differentiable estimator)

Question

I am training a model with pytorch, where I need to calculate the degree of dependence between two tensors (let's say they are the two tensors each containing values very close to zero or one, e.g. v1 = [0.999, 0.998, 0.001, 0.98] and v2 = [0.97, 0.01, 0.997, 0.999]) as a part of my loss function. I am trying to calculate mutual information, but I can't find any mutual information estimation implementation in PyTorch. Has such a thing been provided anywhere?

Umang Gupta · Answer 1 · 2022-05-20T22:49:34.297

1

Mutual information is defined for distribution and not individual points. So, I will write the next part assuming v1 and v2 are samples from a distribution, p. I will also take that you have n samples from p, n>1.

You want a method to estimate mutual information from samples. There are many ways to do this. One of the simplest ways to do this would be to use a non-parametric estimator like NPEET (https://github.com/gregversteeg/NPEET). It works with numpy (you can convert from torch to numpy for this). There are more involved parametric models for which you may be able to find implementation in pytorch (See https://arxiv.org/abs/1905.06922).

If you only have two vectors and want to compute a similarity measure, a dot product similarity would be more suitable than mutual information as there is no distribution.

edited May 20 '22 at 22:49

answered May 20 '22 at 18:45

Umang Gupta

15,022
6
48
66

Yes I meant two vectors that are sampled from two probability distributions. Since I’m using pytorch and I need be differentiation, I can’t use NPEET (uses sklearn function). – Arsalan May 21 '22 at 20:32
In that case, you should see the referenced paper. It has different estimators for lower/upper bound, depending on your optimization problem --- maximization/minimization, you can choose one. – Umang Gupta May 22 '22 at 17:29

score 0 · Answer 2 · answered Nov 09 '22 at 01:56

It is not provided in the official Pytorch code, but here is a pytorch implementation that uses kernel density estimation for the histogram approximation. Note that this method is fully-differentiable.

Alternatively, you can also use the differentiable histogram functions in Kornia to compute the MI metric yourself if you want more control for whatever reason.

How to calculate mutual information in PyTorch (differentiable estimator)

2 Answers2