In what scenario is maximizing of information gain not equivalent to minimizing of entropy? The broader question is why do we need the concept of information gain? Is it not sufficient to work only with entropy to decide the next optimal attribute of a decision tree?
Asked
Active
Viewed 2,345 times
2

Aziz Shaikh
- 16,245
- 11
- 62
- 79

Pradeep Vairamani
- 4,004
- 3
- 36
- 59
-
See the accepted answer here: http://stackoverflow.com/questions/1859554/what-is-entropy-and-information-gain – Lior Kogan Nov 21 '15 at 13:57
-
1Information_Gain = Entropy_before - Entropy_after – Lior Kogan Nov 21 '15 at 13:59
-
Isn't the Entropy_before constant? Doesn't that mean that we should only look to minimize Entropy_after? – Pradeep Vairamani Nov 21 '15 at 21:26
-
Yes. Maximizing the information_Gain and minimizing the Entropy_after is the same thing. – Lior Kogan Nov 22 '15 at 10:41
-
I agree - I cannot see the difference either. But there might be a difference and sklearn even scales the gain witht the fraction of observation in each node https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html which, in my eyes, is just multiplying with a constant, which should make no difference what so ever. – CutePoison May 10 '20 at 08:42