how to apply mutual information on categorical features

Question

I am using Scikit-learn to train a classification model. I have both discrete and continuous features in my training data.

I want to do feature selection using mutual information.

The features 1,2 and 3 are discrete. to this end, I try the code below :

mutual_info_classif(x, y, discrete_features=[1, 2, 3])

but it did not work, it gives me the error:

 ValueError: could not convert string to float: 'INT'

I have apply the code that Mr W.P. McNeill have proposed in https://stackoverflow.com/q/43643278 but did not work — samira, Nov 25 '18 at 17:42
we need more information in order to be able to help you. It might be useful if you copy a simplified example of your code. — silgon, Nov 25 '18 at 18:14
this is my code: from sklearn.feature_selection import mutual_info_classif res_M_train = mutual_info_classif(data_train, Y_train, discrete_features= [1,2,3]) thank you — samira, Nov 25 '18 at 18:23
my data is like this :[0.983874,tcp,http,FIN,10,8,816,1172,17.278635,62,252,5976.375,8342.53125,2,2,109.319333,124.932859,5929.211713,192.590406,255,794167371,1624757001,255,0.206572,0.108393,0.098179,82,147,1,184,2,1,1,1,1,2,0,0,1,1,3,0,] as you can see my three first features are categoricale , and I want to calculate the mutual information of each feature: from sklearn.feature_selection import mutual_info_classif res_M_train = mutual_info_classif(data_train, Y_train, discrete_features= [1,2,3]) — samira, Nov 25 '18 at 18:28

score 4 · Accepted Answer · answered Nov 25 '18 at 18:28

4

A simple example with mutual information classifier:

import numpy as np
from sklearn.feature_selection import mutual_info_classif
X = np.array([[0, 0, 0],
              [1, 1, 0],
              [2, 0, 1],
              [2, 0, 1],
              [2, 0, 1]])
y = np.array([0, 1, 2, 2, 1])
mutual_info_classif(X, y, discrete_features=True)
# result: array([ 0.67301167,  0.22314355,  0.39575279]

answered Nov 25 '18 at 18:28

silgon

6,890
7
46
67

but I have mixed features like this X = np.array([[0, a, 0], [1, b, 0], [2, c,1], [2, d, 1], [2, a, 1]]) – samira Nov 25 '18 at 18:35
this is a row from my Data [8e-06,"udp","-","INT",2,0,1762,0,125000.0003,254,0,881000000.0,0.0,0,0,0.008,0.0,0.0,0.0,0,0,0,0,0.0,0.0,0.0,881,0,0,0,2,2,1,1,1,2,0,0,0,1,2,0] it seems that the three first features cause the problem – samira Nov 26 '18 at 00:12
if you're using categories and you have string information, take a look to [`get_dummies`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.get_dummies.html) – silgon Nov 26 '18 at 09:29

Jatin · Answer 2 · 2020-03-30T01:34:57.587

2

mutual_info_classif can only take numeric data. You need to do label encoding of the categorical features and then run the same code.

x1=x.apply(LabelEncoder().fit_transform)

Then run the exact same code you were running.

mutual_info_classif(x1, y, discrete_features=[1, 2, 3])

edited Mar 30 '20 at 01:34

answered Mar 30 '20 at 01:26

Jatin

21
3

Care with that @Jatin, refering to sklearn's docs: `This transformer should be used to encode target values, i.e. y, and not the input X`. So maybe for this case it is a better option to use [OrdinalEncoder](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OrdinalEncoder.html). – rmoret Dec 22 '21 at 09:03
@rmoret Does it matter for calculating mutual information? "Not limited to real-valued random variables and linear dependence like the correlation coefficient, MI is more general and determines how different the joint distribution of the pair (X,Y) is from the product of the marginal distributions of X and Y. MI is the expected value of the pointwise mutual information (PMI)." [Mutual Information](https://en.wikipedia.org/wiki/Mutual_information) Since we only care about shared information, ordering should not matter? – DataJanitor Feb 14 '23 at 08:41

score 0 · Answer 3 · answered Feb 09 '20 at 05:41

0

.There is a difference between 'discrete' and 'categorical' In this case, function demands the data to be numerical. May be you can use label encoder if you have ordinal features. Else you would have to use one hot encoding for nominal features. You can use pd.get_dummies for this purpose.

answered Feb 09 '20 at 05:41

Parul Singh

363
3
11

Same here. Does it matter whether you have ordinal features for calculating mutual information? "Not limited to real-valued random variables and linear dependence like the correlation coefficient, MI is more general and determines how different the joint distribution of the pair (X,Y) is from the product of the marginal distributions of X and Y. MI is the expected value of the pointwise mutual information (PMI)." [Mutual Information](https://en.wikipedia.org/wiki/Mutual_information) Since we only care about shared information, ordering should not matter? – DataJanitor Feb 14 '23 at 08:44

how to apply mutual information on categorical features

3 Answers3