0

I want to reduce this list of dictionaries to take the most current record of the duplicates, where duplicates are determined by same project_name and same feature_group_name. How do I go about doing that?

The way I'm doing it right now is as follows, but I'm sure there is a better way w/o the need for pandas:

d = pd.DataFrame(l)
sorted_df = d.sort_values('datetime', ascending=False).drop_duplicates(['project_name', 'feature_group_name'],
                                                                       keep='first')
sorted_df.to_dict(orient='records')

Example:

l = [{'project_name': 'test', 'feature_group_name': 'dnb_features',
      'datetime': '2020-02-10T21:24:29Z', 'id': '1'},
     {'project_name': 'test', 'feature_group_name': 'dnb_features2',
      'datetime': '2020-02-10T21:24:29Z', 'id': '2'},
    {'project_name': 'test', 'feature_group_name': 'dnb_features',
      'datetime': '2020-02-09T21:24:29Z', 'id': '3'},
    {'project_name': 'test', 'feature_group_name': 'dnb_features',
      'datetime': '2020-02-08T21:24:29Z', 'id': '4'},
    {'project_name': 'test', 'feature_group_name': 'dnb_features2',
      'datetime': '2020-02-08T21:24:29Z', 'id': '5'},
    {'project_name': 'test', 'feature_group_name': 'dnb_features3',
      'datetime': '2020-02-08T21:24:29Z', 'id': '6'}]

Desired Result:

[{'project_name': 'test', 'feature_group_name': 'dnb_features',
      'datetime': '2020-02-10T21:24:29Z', 'id': '1'},
     {'project_name': 'test', 'feature_group_name': 'dnb_features2',
      'datetime': '2020-02-10T21:24:29Z', 'id': '2'},
    {'project_name': 'test', 'feature_group_name': 'dnb_features3',
      'datetime': '2020-02-08T21:24:29Z', 'id': '6'}]
Riley Hun
  • 2,541
  • 5
  • 31
  • 77

1 Answers1

0
    l  = [i for n, i in enumerate(l) if i not in l[n + 1:]] 

  

  • Even if the solution is correct, you should care to give a well explanation too - [from review](https://stackoverflow.com/review/low-quality-posts/25319603) – letsintegreat Feb 11 '20 at 06:46