1

I have to compute mutual information for continuous/numeric features. I want to apply feature selection based on this. Feature set description is given below

feature1: can assume any value between 1 - 10000 feature2: measures time spent on something - thus can assume any value but integers (large) .... I have these kind of features.

I am confused on applying mutual information formula for this. Wikipedia says integration is required continuous variables.

Do I need to discretize the features prior to apply MI ??

alex
  • 1,421
  • 1
  • 16
  • 19

1 Answers1

0

I think you need to discretize the features prior to apply MI

when applied information gain for feature selection in continues variable, a split point is chosen to split the value space for the variable in seperated parts, which need to evaluate all the possible split points to get the best one for the feature. I think it is the same in mutual information for feature selection, you may choose the discretize the continues space to a certain discretization value instead, which I think would do the same, if the value space is smooth

michaeltang
  • 2,850
  • 15
  • 18
  • Thanks @michaeltang . I believe same logic also applies to numerical attributes. am i correct ?? – alex Apr 23 '14 at 00:31