I've used the following code to compute the Mutual Information and Chi Square values for feature selection in Sentiment Analysis.
MI = (N11/N)*math.log((N*N11)/((N11+N10)*(N11+N01)),2) + (N01/N)*math.log((N*N01)/((N01+N00)*(N11+N01)),2) + (N10/N)*math.log((N*N10)/((N10+N11)*(N00+N10)),2) + (N00/N)*math.log((N*N00)/((N10+N00)*(N01+N00)),2)
where N11,N01,N10 and N00 are the observed frequencies of the two features in my data set.
NOTE : I am trying to calculate the mutual information and Chi Squared values between 2 features and not the mutual information between a particular feature and a class. I'm doing this so I'll know if the two features are related in any way.
The Chi Squared formula I've used is :
E00 = N*((N00+N10)/N)*((N00+N01)/N)
E01 = N*((N01+N11)/N)*((N01+N00)/N)
E10 = N*((N10+N11)/N)*((N10+N00)/N)
E11 = N*((N11+N10)/N)*((N11+N01)/N)
chi = ((N11-E11)**2)/E11 + ((N00-E00)**2)/E00 + ((N01-E01)**2)/E01 + ((N10-E10)**2)/E10
Where E00,E01,E10,E11 are the expected frequencies.
By the definition of Mutual Information, a low value should mean that one feature does not give me information about the other and by the definition of Chi Square, a low value of Chi Square means that the two features must be independent.
But for a certain two features, i got a Mutual information score of 0.00416 and a Chi Square value of 4373.9. This doesn't make sense to me since the Mutual information score indicates the features aren't closely related but the Chi Square value seems to be high enough to indicate they aren't independent either. I think I'm going wrong with my interpretation
The values I got for the observed frequencies are
N00 = 312412
N01 = 276116
N10 = 51120
N11 = 68846