Information gain vs minimizing entropy

Question

In what scenario is maximizing of information gain not equivalent to minimizing of entropy? The broader question is why do we need the concept of information gain? Is it not sufficient to work only with entropy to decide the next optimal attribute of a decision tree?

See the accepted answer here: http://stackoverflow.com/questions/1859554/what-is-entropy-and-information-gain — Lior Kogan, Nov 21 '15 at 13:57
Isn't the Entropy_before constant? Doesn't that mean that we should only look to minimize Entropy_after? — Pradeep Vairamani, Nov 21 '15 at 21:26
Yes. Maximizing the information_Gain and minimizing the Entropy_after is the same thing. — Lior Kogan, Nov 22 '15 at 10:41
I agree - I cannot see the difference either. But there might be a difference and sklearn even scales the gain witht the fraction of observation in each node https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html which, in my eyes, is just multiplying with a constant, which should make no difference what so ever. — CutePoison, May 10 '20 at 08:42

score 0 · Answer 1 · answered May 15 '18 at 21:48

0

Maximizing the IG (also known as Mutual Information) tends to provide the same result as minimizing the entropy.

Basically, if you minimize the entropy you're forcing Information Gain to be maximum.

answered May 15 '18 at 21:48

Information gain vs minimizing entropy

1 Answers1