Python's implementation of Mutual Information

Question

I am having some issues implementing the Mutual Information Function that Python's machine learning libraries provide, in particular : sklearn.metrics.mutual_info_score(labels_true, labels_pred, contingency=None)

(http://scikit-learn.org/stable/modules/generated/sklearn.metrics.mutual_info_score.html)

I am trying to implement the example I find in the Stanford NLP tutorial site:

example

The site is found here : http://nlp.stanford.edu/IR-book/html/htmledition/mutual-information-1.html#mifeatsel2

The problem is I keep getting different results, without figuring out the reason yet.

I get the concept of Mutual Information and feature selection, I just don't understand how it is implemented in Python. What I do is that I provide the mutual_info_score method with two arrays based on the NLP site example, but it outputs different results. The other interesting fact is that anyhow you play around and change numbers on those arrays you are most likely to get the same result. Am I supposed to use another data structure specific to Python or what is the issue behind this? If anyone has used this function successfully in the past it would be of a great help to me, thank you for your time.

you should provide us with working example of what is exactly "wrong". — lejlot, Jul 12 '14 at 21:55

lyx · Accepted Answer · 2014-07-18T14:12:58.707

I encountered the same issue today. After a few trials I found the real reason: you take log2 if you strictly followed NLP tutorial, but sklearn.metrics.mutual_info_score uses natural logarithm(base e, Euler's number). I didn't find this detail in sklearn documentation...

I verified this by:

import numpy as np
def computeMI(x, y):
    sum_mi = 0.0
    x_value_list = np.unique(x)
    y_value_list = np.unique(y)
    Px = np.array([ len(x[x==xval])/float(len(x)) for xval in x_value_list ]) #P(x)
    Py = np.array([ len(y[y==yval])/float(len(y)) for yval in y_value_list ]) #P(y)
    for i in xrange(len(x_value_list)):
        if Px[i] ==0.:
            continue
        sy = y[x == x_value_list[i]]
        if len(sy)== 0:
            continue
        pxy = np.array([len(sy[sy==yval])/float(len(y))  for yval in y_value_list]) #p(x,y)
        t = pxy[Py>0.]/Py[Py>0.] /Px[i] # log(P(x,y)/( P(x)*P(y))
        sum_mi += sum(pxy[t>0]*np.log2( t[t>0]) ) # sum ( P(x,y)* log(P(x,y)/( P(x)*P(y)) )
    return sum_mi

If you change this np.log2 to np.log, I think it would give you the same answer as sklearn. The only difference is that when this method returns 0, sklearn will return a number very near to 0. ( And of course, use sklearn if you don't care about log base, my piece of code is just for demo, it gives poor performance...)

FYI, 1)sklearn.metrics.mutual_info_score takes lists as well as np.array; 2) the sklearn.metrics.cluster.entropy uses also log, not log2

Edit: as for "same result", I'm not sure what you really mean. In general, the values in the vectors don't really matter, it is the "distribution" of values that matters. You care about P(X=x), P(Y=y) and P(X=x,Y=y), not the value x,y.

could you show a running example? I got different values on [1,2,2], [1,2,2] as arguments for this routine and mutual_info_score — user1603472, Dec 04 '14 at 21:54

score -4 · Answer 2 · edited Feb 22 '18 at 22:40

-4

The code below should provided a result: 0.00011053558610110256

c=np.concatenate([np.ones(49), np.zeros(27652), np.ones(141), np.zeros(774106) ])
t=np.concatenate([np.ones(49), np.ones(27652), np.zeros(141), np.zeros(774106)])

computeMI(c,t)

edited Feb 22 '18 at 22:40

Dov Benyomin Sohacheski

7,133
7
38
64

answered Feb 22 '18 at 21:48

Mingyuan Tian

1
1

Python's implementation of Mutual Information

2 Answers2

Linked