Shannon entropy to mutual information

Question

I have some statistics over some properties like:

1st iter : p1:10 p2:0 p3:12 p4:33 p5:0.17 p6:ok p8:133 p9:89
2nd iter : p1:43 p2:1 p6:ok p8:12 p9:33
3rd iter : p1:14 p2:0 p3:33 p5:0.13 p9:2
...

(p1 -> number of tries, p2 -> try done well, p3..pN -> properties of try).

I need to calculate the amount of information of each property. After some procedures of quantization (for ex. to 10 levels) to make all input numbers on the same level the input file starts to look like:

p0: 4 3 2 4 5 5 6 7
p3: 4 5 3 3   
p4: 5 3 3 2 1 2 3 
...

Where p(0) = funct(p1,p2).

Not every input line got every pK so len(pk) <= len(p0).

Now I know how to calculate entropy of each property via Shannon entropy for each line. I need to calculate mutual information from here.

Calculation of joint entropy for mutual information I(p0,pK) is stuck because of different lengths.

I'm calculating entropy for one element like this:

def entropy(x):
    probs = [np.mean(x == c) for c in set(x)]
    return np.sum(-p * np.log2(p) for p in probs)

So, for joint I need to use product to generate input array x and use zip(p0,pk) instead of set(x)?

You want to calculate mutual information between which two quantities? — Konstantin, Sep 16 '13 at 08:21
The main goal is to increase F = p1/p2 (if p2=0: F = F0) in the way of deleting low information properties from the list. So is it about pK and p1(p2)? @Konstantin — aromatvanili, Sep 16 '13 at 09:17
Still not clear to me. You have a set of `N` samples, each sample consisting of 9 elements/properties. You are calculating the entropy for each property. Fine. Now you say, you want to calculate mutual information. Between what and what?? — Konstantin, Sep 16 '13 at 10:05

BartoszKP · Accepted Answer · 2016-07-04T23:02:11.787

I'm assuming that you want to calculate mutual information between each p1 and each of p2, p3,... subsequently.

1) Calculate H(X) as entropy from p1 with:

Equation 1

each x being subsequent element from p1.

2) Calculate H(Y) as entropy from pK with the same equation, with each x being subsequent element from p1

3) Create a new pair collection out of p1 and pK:

pairs = zip(p1, pK)

Note that if the values in columns of your data have different meaning then you should probably fill the missing data (for example using 0s or values from previous iteration).

4) Calculate joint entropy H(X,Y) using:

Equation 2

Note that you can't just use the first equation and treat each pair as a single element - you must iterate through the whole Cartesian product between p1 and pK in this equation, calculating probabilities using pairs collection. So, for iterating over the whole Cartesian product use for xy in itertools.product(p1, pK): ....

5) Then you can have the mutual information between p1 and pK as:

Equation 3

Using numpy capabilities you can calculate joint entropy as presented here:

def entropy(X, Y):
    probs = []
    for c1 in set(X):
        for c2 in set(Y):
            probs.append(np.mean(np.logical_and(X == c1, Y == c2)))

    return np.sum(-p * np.log2(p) for p in probs if p > 0)

where if p > 0 is consistent with entropy's definition:

In the case of p(x_i) = 0 for some i, the value of the corresponding summand 0 log_b(0) is taken to be 0

If you don't want to use numpy, then a version without it might look something like:

def entropyPart(p):
    if not p:
        return 0

    return -p * math.log(p)

def entropy(X, Y):
    pairs = zip(X, Y)
    probs = []
    for pair in itertools.product(X,Y):
        probs.append(1.0 * sum([p == pair for p in pairs]) / len(pairs))

   return sum([entropyPart(p) for p in probs])

score 0 · Answer 2 · answered Sep 16 '13 at 15:10

0

Take the formula from the Formal Definition section of this Wikipedia article. They call it Information Gain but it is the same as Mutual Information. In order to calculate entropy of a sample, contained in this formula, take the formula from the Definition section of this Wikipedia article.

So, you first calculate the entropy of your whole data set and subtract from it the entropy that is left when you know the value of the atribute in question.

Multi-dimensional histogram can be calculated in Python using numpy.histogramdd().

answered Sep 16 '13 at 15:10

Konstantin

2,451
1
24
26

wouldn't it be the same result as by method above? – aromatvanili Sep 16 '13 at 15:22
Ok, so your main problem is that you don't know all the attributes for all the samples. when calculating mutual information between p0 and pk, use only those samples that contain pk. – Konstantin Sep 16 '13 at 15:30
then I should use different `p0` everytime and they always would be the same lenght. And it sounds legit. Nice. But it seems I need to recode half of my work to know what sample from `p0` contain `pk`. But anyway, thanks. – aromatvanili Sep 16 '13 at 15:39
Why different p0? I understood that your p0 is a function of p1 and p2 and they are always present, right? When calculating MI between p0 and pk you don't need other attributes, so your calculations will always be based on two vectors, p0 and pk. – Konstantin Sep 16 '13 at 16:03
as I understand: for joint entropy I should take only that elements of p0 that is affected by pK, so len(pK) = len(p0'), where p0' is a subarray of p0 – aromatvanili Sep 16 '13 at 16:51
Sorry, I have no idea what you are talking about. If you want people, who cannot read your mind, to help you, you need to be more clear. You said p0=func(p1,p2). How do you know which of the elements of p0 is affected by pk? How do you calculate p0'? – Konstantin Sep 17 '13 at 07:56
Its all fine. I've just resaid things you told me. – aromatvanili Sep 17 '13 at 14:38

Shannon entropy to mutual information

2 Answers2