Say the decision tree has k classes (c1,c2,...,ck) to classify and the dataset of the parent node is D. Pi denotes the proportion of elements labelled with class ci. And Gini impurity is:
If one partitions the node to subnodes with subsets D1 and D2 which are complementary and not intersected. How to prove:
I understand that the information gain should not be negative so this inequality should exist. Could anyone help?