I want to reduce this list of dictionaries to take the most current record of the duplicates, where duplicates are determined by same project_name and same feature_group_name. How do I go about doing that?
The way I'm doing it right now is as follows, but I'm sure there is a better way w/o the need for pandas:
d = pd.DataFrame(l)
sorted_df = d.sort_values('datetime', ascending=False).drop_duplicates(['project_name', 'feature_group_name'],
keep='first')
sorted_df.to_dict(orient='records')
Example:
l = [{'project_name': 'test', 'feature_group_name': 'dnb_features',
'datetime': '2020-02-10T21:24:29Z', 'id': '1'},
{'project_name': 'test', 'feature_group_name': 'dnb_features2',
'datetime': '2020-02-10T21:24:29Z', 'id': '2'},
{'project_name': 'test', 'feature_group_name': 'dnb_features',
'datetime': '2020-02-09T21:24:29Z', 'id': '3'},
{'project_name': 'test', 'feature_group_name': 'dnb_features',
'datetime': '2020-02-08T21:24:29Z', 'id': '4'},
{'project_name': 'test', 'feature_group_name': 'dnb_features2',
'datetime': '2020-02-08T21:24:29Z', 'id': '5'},
{'project_name': 'test', 'feature_group_name': 'dnb_features3',
'datetime': '2020-02-08T21:24:29Z', 'id': '6'}]
Desired Result:
[{'project_name': 'test', 'feature_group_name': 'dnb_features',
'datetime': '2020-02-10T21:24:29Z', 'id': '1'},
{'project_name': 'test', 'feature_group_name': 'dnb_features2',
'datetime': '2020-02-10T21:24:29Z', 'id': '2'},
{'project_name': 'test', 'feature_group_name': 'dnb_features3',
'datetime': '2020-02-08T21:24:29Z', 'id': '6'}]