I have a list of ~12K dictionaries. Each dictionary has the same keys: year
, code
and category
.
L = [{"year": "2015", "code": "VU", "category": "Vulnerable"}, {"year": "2008", "code": "VU", "category": "Vulnerable"}, {"year": "2004", "code": "LC", "category": "Least Concern"}]
I'm trying to create a new dictionary that will have, as key, each value of code
and, as the value to that key, a list of unique years for each code
(I don't necessarily need the category
key-value pair):
{"VU": {2008, 2015}, "LC": {2004}}
I created a dictionary codes_dict
with the correct codes as keys, and empty sets as values (since I don't want duplicates, and I really only need the earliest and latest years.)
codes = (e['code'] for e in L)
codes_dict = dict.fromkeys(codes, set())
for e in L:
codes_dict[e['code']].add(e['year'])
However, when I try to populate the values, I get every year added to every code:
{'VU': {'2004', '2008', '2015'}, 'LC': {'2004', '2008', '2015'}}
What am I missing? I tried using a list
instead of a set
and got the same result (with duplicates). Also using =
instead of add()
means only the last value is added, whereas I want the whole range.
Performance isn't really an issue, as this is just supposed to be a quick diagnostic.
Bonus: if there is a better way to do this in pandas, I'd love to hear it.
Thanks!