I am trying to apply the formula:
I am unclear why this does not work:
def gini_node(node):
count = sum(node)
gini = functools.reduce(lambda p,c: p + (1 - (c/count)**2), node)
print(count, gini)
print(1 - (node[0]/count)**2, 1 - (node[1]/count)**2)
return gini
Evaluating gini([[175, 330], [220, 120]])
prints:
505 175.57298304087834
0.8799137339476522 0.5729830408783452
340 220.87543252595157
0.5813148788927336 0.8754325259515571
note that the second print statement prints the figures that I want to sum, given the example input. the return value (the first print statement's second value) should be a number between 0 and 1.
What is wrong with my reduce?
Full function I am trying to write is:
import functools
def gini_node(node):
count = sum(node)
gini = functools.reduce(lambda p,c: p + (1 - (c/count)**2), node)
print(count, gini)
print(1 - (node[0]/count)**2, 1 - (node[1]/count)**2)
return gini
def gini (groups):
counts = [ sum(node) for node in groups ]
count = sum(counts)
proportions = [ n/count for n in counts ]
return sum([ gini_node(node) * proportion for node, proportion in zip(groups, proportions)])
# test
print(gini([[175, 330], [220, 120]]))