How Information Gain Works in Text Classification

Question

I have to learn information gain for feature selection right now, But I don't have clear comprehension about it. I am a newbie, and I'm confused about it.

How to use IG in feature selection (manual calculation)?

I just have clue this .. That have anyone can help me how to use the formula:

then this is the example:

please explain what you do and do not understand (the formula? the purpose of information gain? how to code it? what's a probability?) — Pascal Soucy, Dec 15 '16 at 21:50
It is a good question. Also, I have a related question about Information Gain, too. In some cases we need to calculate logarithm (0) which is not feasible to calculate. How we must do in such situations? — Hamed Baziyad, Jun 06 '20 at 15:11

Wasi Ahmad · Answer 1 · 2017-02-06T13:47:54.957

How to use information gain in feature selection?

Information gain (InfoGain(t)) measures the number of bits of information obtained for prediction of a class (c) by knowing the presence or absence of a term (t) in a document.

Concisely, the information gain is a measure of the reduction in entropy of the class variable after the value for the feature is observed. In other words, information gain for classification is a measure of how common a feature is in a particular class compared to how common it is in all other classes.

In text classification, feature means the terms appeared in documents (a.k.a corpus). Consider, two terms in the corpus - term1 and term2. If term1 is reducing entropy of the class variable by a larger value than term2, then term1 is more useful than term2 for document classification in this example.

Example in the context of sentiment classification

A word that occurs primarily in positive movie reviews and rarely in negative reviews contains high information. For example, the presence of the word “magnificent” in a movie review is a strong indicator that the review is positive. That makes “magnificent” a high informative word.

Compute entropy and information gain in python

Measuring Entropy and Information Gain

score 0 · Answer 2 · answered Jun 23 '19 at 18:29

The formula comes from mutual information, in this case, you can think of mutual information as how much information the presence of the term t gives us for guessing the class.

Check: https://nlp.stanford.edu/IR-book/html/htmledition/mutual-information-1.html

How Information Gain Works in Text Classification

2 Answers2