4

I have to learn information gain for feature selection right now, But I don't have clear comprehension about it. I am a newbie, and I'm confused about it.

How to use IG in feature selection (manual calculation)?

I just have clue this .. That have anyone can help me how to use the formula:

enter image description here

then this is the example:

enter image description here

Ankit Agrawal
  • 616
  • 9
  • 20
  • please explain what you do and do not understand (the formula? the purpose of information gain? how to code it? what's a probability?) – Pascal Soucy Dec 15 '16 at 21:50
  • i hope my explanation will help you. – Wasi Ahmad Dec 23 '16 at 23:06
  • It is a good question. Also, I have a related question about Information Gain, too. In some cases we need to calculate logarithm (0) which is not feasible to calculate. How we must do in such situations? – Hamed Baziyad Jun 06 '20 at 15:11

2 Answers2

2

How to use information gain in feature selection?

Information gain (InfoGain(t)) measures the number of bits of information obtained for prediction of a class (c) by knowing the presence or absence of a term (t) in a document.

Concisely, the information gain is a measure of the reduction in entropy of the class variable after the value for the feature is observed. In other words, information gain for classification is a measure of how common a feature is in a particular class compared to how common it is in all other classes.

In text classification, feature means the terms appeared in documents (a.k.a corpus). Consider, two terms in the corpus - term1 and term2. If term1 is reducing entropy of the class variable by a larger value than term2, then term1 is more useful than term2 for document classification in this example.

Example in the context of sentiment classification

A word that occurs primarily in positive movie reviews and rarely in negative reviews contains high information. For example, the presence of the word “magnificent” in a movie review is a strong indicator that the review is positive. That makes “magnificent” a high informative word.

Compute entropy and information gain in python

Wasi Ahmad
  • 35,739
  • 32
  • 114
  • 161
0

The formula comes from mutual information, in this case, you can think of mutual information as how much information the presence of the term t gives us for guessing the class.

enter image description here

Check: https://nlp.stanford.edu/IR-book/html/htmledition/mutual-information-1.html

Parag
  • 662
  • 9
  • 15