How to use information gain in feature selection?
Information gain (InfoGain(t)
) measures the number of bits of information obtained for prediction of a class (c) by knowing the presence or absence of a term (t) in a document.
Concisely, the information gain is a measure of the reduction in entropy of the class variable after the value for the feature is observed. In other words, information gain for classification is a measure of how common a feature is in a particular class compared to how common it is in all other classes.
In text classification, feature means the terms appeared in documents (a.k.a corpus). Consider, two terms in the corpus - term1
and term2
. If term1
is reducing entropy of the class variable by a larger value than term2
, then term1
is more useful than term2
for document classification in this example.
Example in the context of sentiment classification
A word that occurs primarily in positive movie reviews and rarely in negative reviews contains high information. For example, the presence of the word “magnificent” in a movie review is a strong indicator that the review is positive. That makes “magnificent” a high informative word.
Compute entropy and information gain in python