13

I am having problems interpreting the results of the mi.plugin() (or mi.empirical()) function from the entropy package. As far as I understand, an MI=0 tells you that the two variables that you are comparing are completely independent; and as MI increases, the association between the two variables is increasingly non-random.

Why, then, do I get a value of 0 when running the following in R (using the {entropy} package):

mi.plugin( rbind( c(1, 2, 3), c(1, 2, 3) ) )

when I'm comparing two vectors that are exactly the same?

I assume my confusion is based on a theoretical misunderstanding on my part, can someone tell me where I've gone wrong?

Thanks in advance.

Tal Galili
  • 24,605
  • 44
  • 129
  • 187
lemhop
  • 131
  • 1
  • 1
  • 4
  • `mi.plugin()` takes a matrix of joint bin frequencies. See `?mi.plugin`. – NPE Sep 11 '14 at 15:07
  • Thank you @NPE, may then be using an inappropriate function, so let me expand a little on what I'm trying to do. I have two continuous variables, and I want to know the MI between these two variables. I want to be able to say to what extent can I predict one from the other? Should I calculate the joint bin frequencies for `mi.plugin()` or is there a more appropriate function I should use? – lemhop Sep 11 '14 at 15:17
  • Nevermind, I calculated the joint bin frequencies and got my MI scores which now make sense. Ta. – lemhop Sep 11 '14 at 15:35

2 Answers2

11

Use mutinformation(x,y) from package infotheo.

> mutinformation(c(1, 2, 3), c(1, 2, 3) ) 
[1] 1.098612

> mutinformation(seq(1:5),seq(1:5))
[1] 1.609438

and normalized mutual information will be 1.

Monicam
  • 171
  • 1
  • 5
4

the mi.plugin function works on the joint frequency matrix of the two random variables. The joint frequency matrix indicates the number of times for X and Y getting the specific outcomes of x and y. In your example, you would like X to have 3 possible outcomes - x=1, x=2, x=3, and Y should also have 3 possible outcomes, y=1, y=2, y=3. Let's go through your example and calculate the joint frequency matrix:

> X=c(1, 2, 3)
> Y=c(1, 2, 3)
> freqs=matrix(sapply(seq(max(X)*max(Y)), function(x) length(which(((X-1)*max(Y)+Y)==x))),ncol=max(X))
> freqs
     [,1] [,2] [,3]
[1,]    1    0    0
[2,]    0    1    0
[3,]    0    0    1

This matrix shows the number of occurrences of X=x and Y=y. For example there was one observation for which X=1 and Y=1. There were 0 observations for which X=2 and Y=1. You can now use the mi.plugin function:

> mi.plugin(freqs)
[1] 1.098612
Roee Anuar
  • 3,071
  • 1
  • 19
  • 33