Why is the entropy of a uniform distribution lower than repeated values in R?

Question

According to Wikipedia, the uniform distribution is the "maximum entropy probability distribution". Thus, if I have two sequences (one uniformly distributed and one with repeated values), both of length k, then I would expect the entropy of the uniformly distributed sequence to be higher than the sequence of repeated values. However, this is not what is observed when running the following code in R:

require(entropy)
entropy(runif(1024), method="ML", unit="log2")
entropy(rep(1,1024), method="ML", unit="log2")

The first output produces around 9.7 bits of entropy, while the second produces exactly 10 bits of entropy (log base 2 of 1024 = 10). Why does the uniform distribution not have more than 10 bits of entropy?

I thought it might be because of the implementation. If the "MM" method is used, the behavior is as expected except that the entropy of repeated values depends on those values, it should not. The "ML" method correctly handles this. Edit: Cross validated might be a better place anyways. I did not know if it until you suggested it. — Alpha Bravo, Jul 07 '16 at 15:13
If it's the implementation, then try looking at `getAnywhere("entropy.empirical")`, `getAnywhere("entropy.MillerMadow")` and `getAnywhere("entropy.plugin")`. — slamballais, Jul 07 '16 at 15:19
Thanks. I've looked at the source code but I'm not sure why it behaves as it does. In summary, it takes the data, bins it with the freqs function, and then applies the log function to it. — Alpha Bravo, Jul 07 '16 at 15:24

Josh O'Brien · Accepted Answer · 2016-07-07T17:34:16.563

I think you are misunderstanding what the first argument, y, in entropy() represents. As mentioned in ?entropy, it gives a vector of counts. Those counts together give the relative frequencies of each of the symbols from which messages on this "discrete source of information" are composed.

To see how that plays out, have a look at a simpler example, that of a binary information source with just two symbols (1/0, on/off, A/B, what have you). In this case, all of the following will give the entropy for a source in which the relative frequencies of the two symbols are the same (i.e. half the symbols are As and half are Bs):

entropy(c(0.5, 0.5))
# [1] 0.6931472
entropy(c(1,1))
# [1] 0.6931472
entropy(c(1000,1000))
# [1] 0.6931472
entropy(c(0.0004, 0.0004))  
# [1] 0.6931472
entropy(rep(1,2))
# [1] 0.6931472

Because those all refer to the same underlying distribution, in which probability is maximally spread out among the available symbols, they each give the highest possible entropy for a two-state information source (log(2) = 0.6931472)).

When you do instead entropy(runif(2)), you are supplying relative probabilities for the two symbols that are randomly selected from the uniform distribution. Unless those two randomly selected numbers are exactly equal, you are telling entropy() that you've got an information source with two symbols that are used with different frequencies. As a result, you'll always get a computed entropy that's lower than log(2). Here's a quick example to illustrate what I mean:

set.seed(4)
(x <- runif(2))
# [1] 0.585800305 0.008945796
freqs.empirical(x)  ## Helper function called by `entropy()` via `entropy.empirical()`
# [1] 0.98495863 0.01504137

## Low entropy, as you should expect 
entropy(x)
# [1] 0.07805556

## Essentially the same thing; you can interpret this as the expected entropy
## of a source from which a message with 984 '0's and 15 '1's has been observed
entropy(c(984, 15))

In summary, by passing the y= argument a long string of 1s, as in entropy(rep(1, 1024)), you are describing an information source that is a discrete analogue of the uniform distribution. Over the long run or in a very long message, each of its 1024 letters is expected to occur with equal frequency, and you can't get any more uniform than that!

Thanks for the help. What I want to do is measure the entropy of an array. Each element in the array contains a number which corresponds to the number of times that element was accessed in memory (it is for security research). In order to measure the entropy correctly, would I need to apply the freqs function on that array first? — Alpha Bravo, Jul 07 '16 at 18:33
@AlphaBravo Sorry, I won't be able to help you or give you any useful advice about how to apply entropy computations to your particular application. Since `freqs.empirical()` normalizes any data handed to it (so that their frequencies sum to `1`), though, it makes no difference whether you hand it raw counts or already normalized data. — Josh O'Brien, Jul 07 '16 at 18:50

Why is the entropy of a uniform distribution lower than repeated values in R?

1 Answers1