Given the following
policies = [
{"feature_1": "A", "feature_2": "London", "feature_3": 1000, "feature_4": 10},
{"feature_1": "A", "feature_2": "London", "feature_3": 2000, "feature_4": 20},
{"feature_1": "B", "feature_2": "Dublin", "feature_3": 3000, "feature_4": 30},
{"feature_1": "B", "feature_2": "Dublin", "feature_3": 4000, "feature_4": 40},
{"feature_1": "A", "feature_2": "London", "feature_3": 5000, "feature_4": 50},
{"feature_1": "C", "feature_2": "London", "feature_3": 6000, "feature_4": 60}
]
I want to pass the above dict and two lists
group_fields = ["feature_1", "feature_2"]
sum_fields = ["feature_3", "feature_4"]
and get back
[{'feature_1': 'A', 'feature_2': 'London', 'feature_3': 8000, 'feature_4': 80},
{'feature_1': 'B', 'feature_2': 'Dublin', 'feature_3': 7000, 'feature_4': 70},
{'feature_1': 'C', 'feature_2': 'London', 'feature_3': 6000, 'feature_4': 60}]
So it has grouped over the group_fields and summed over the sum_fields (both of which are subject to change)
This is closely related to Group by multiple keys and summarize/average values of a list of dictionaries but I had problems generalising this approach to my problem.
from itertools import groupby
from operator import itemgetter
from pprint import pprint
grouper = itemgetter(*group_fields)
result = []
for key, grp in groupby(sorted(policies, key=grouper), grouper):
temp_dict = dict(zip(group_fields, key))
group_tuple = [(item["feature_3"], item["feature_4"]) for item in grp]
temp_dict["feature_3"] = sum([item[0] for item in group_tuple])
temp_dict["feature_4"] = sum([item[1] for item in group_tuple])
result.append(temp_dict)
pprint(result)
This does work but I have had to hardcode feature_3 and feature_4. I can't figure out how to abstract that out so the only place I type those features in is within the sum_fields variable. I also don't like I have to sum multiple times over group_tuple to get my values out. Can someone please help?
Thanks