I have some statistics over some properties like:
1st iter : p1:10 p2:0 p3:12 p4:33 p5:0.17 p6:ok p8:133 p9:89
2nd iter : p1:43 p2:1 p6:ok p8:12 p9:33
3rd iter : p1:14 p2:0 p3:33 p5:0.13 p9:2
...
(p1 -> number of tries, p2 -> try done well, p3..pN -> properties of try).
I need to calculate the amount of information of each property. After some procedures of quantization (for ex. to 10 levels) to make all input numbers on the same level the input file starts to look like:
p0: 4 3 2 4 5 5 6 7
p3: 4 5 3 3
p4: 5 3 3 2 1 2 3
...
Where p(0) = funct(p1,p2)
.
Not every input line got every pK
so len(pk) <= len(p0)
.
Now I know how to calculate entropy of each property via Shannon entropy for each line. I need to calculate mutual information from here.
Calculation of joint entropy for mutual information I(p0,pK)
is stuck because of different lengths.
I'm calculating entropy for one element like this:
def entropy(x):
probs = [np.mean(x == c) for c in set(x)]
return np.sum(-p * np.log2(p) for p in probs)
So, for joint I need to use product
to generate input array x
and use zip(p0,pk)
instead of set(x)
?