Group and aggregate a list of dictionaries by multiple keys

Question

I have a list that includes dictionaries (List[Dict, Dict, ...]) , I would like to uniqify the list based on two keys, but I want to retain the value of another key in the dictionary to make sure I do not lose it by making a list in the key I want to retain. I am using Python for the code. If it is of any significance Python 3.x to be exact.

Let's assume I have the following list of dictionaries with three keys: number, favorite, and color. I want to uniqify the list elements using the keys number and favorite. However for the dictionaries that have the same values number and favorite, I'd like to add a list under the key color to make sure I have all the colors for the same combination of number and favorite. This list should also be unique since it shouldn't need the repeated colors for the same combination. However, if there is only one element for the key color in the final result, it should be a string and not a list.

lst = [
{'number': 1, 'favorite': False, 'color': 'red'},
{'number': 1, 'favorite': False, 'color': 'green'},
{'number': 1, 'favorite': False, 'color': 'red'},
{'number': 1, 'favorite': True, 'color': 'red'},
{'number': 2, 'favorite': False, 'color': 'red'}]

Using the aforementioned uniqify, I would get the following result:

lst = [
    {'number': 1, 'favorite': False, 'color': {'red', 'green'}},
    {'number': 1, 'favorite': True, 'color': 'red'},
    {'number': 2, 'favorite': False, 'color': 'red'},
]

Note that there is only one instance of red where the number is 1 and favorite is False even though it appeared twice in the list before it was uniqified. Also note that when there is only one element for the key color in the second dict, it is a string and not a list.

it basically is a list that serves as a value to the key `color` @Praind — KaanTheGuru, Jan 18 '19 at 06:55
Is it necessary to have a string instead of a list if there's only one element? — Praind, Jan 18 '19 at 06:58
yes it is necessary, will edit question to include that detail. Thank you @Praind — KaanTheGuru, Jan 18 '19 at 07:01
Possible duplicates https://stackoverflow.com/questions/21674331/group-by-multiple-keys-and-summarize-average-values-of-a-list-of-dictionaries and https://stackoverflow.com/questions/18066269/group-by-and-aggregate-the-values-of-a-list-of-dictionaries-in-python — Mazdak, Jan 18 '19 at 11:37

cs95 · Accepted Answer · 2019-01-18T07:10:54.370

Using pure python, you can do insert into an OrderedDict to retain insertion order:

from collections import OrderedDict

d = OrderedDict()
for l in lst:
    d.setdefault((l['number'], l['favorite']), set()).add(l['color'])

[{'number': k[0], 'favorite': k[1], 'color': v.pop() if len(v) == 1 else v} 
    for k, v in d.items()]   
# [{'color': {'green', 'red'}, 'favorite': False, 'number': 1},
#  {'color': 'red', 'favorite': True, 'number': 1},
#  {'color': 'red', 'favorite': False, 'number': 2}]

This can also be done quite easily using the pandas GroupBy API:

import pandas as pd

d = (pd.DataFrame(lst)
       .groupby(['number', 'favorite'])
       .color
       .agg(set)
       .reset_index()
       .to_dict('r'))
d
# [{'color': {'green', 'red'}, 'favorite': False, 'number': 1},
#  {'color': {'red'}, 'favorite': True, 'number': 1},
#  {'color': {'red'}, 'favorite': False, 'number': 2}]

If the condition of a string for a single element is required, you can use

[{'color': (lambda v: v.pop() if len(v) == 1 else v)(d_.pop('color')), **d_} 
     for d_ in d]
# [{'color': {'green', 'red'}, 'favorite': False, 'number': 1},
#  {'color': 'red', 'favorite': True, 'number': 1},
#  {'color': 'red', 'favorite': False, 'number': 2}]

What if we pass `.color` and `.number` both to the above solution. I have another use case where I need to pass two-column of the dictionary. — tenstormavi, Sep 21 '20 at 16:15

Praind · Answer 2 · 2019-01-18T07:14:13.667

3

A solution in pure Python would be to use a defaultdict with a composite key. You could use that to merge your values. Afterwards you can create a list again out of that dictionary.

from collections import defaultdict

dct = defaultdict([])

for entry in lst:
    dct[(entry['number'], entry['favorite'])].append(entry['color'])

lst = [{'number': key[0], 'favorite': key[1], color: value if len(value) > 1 else value[0]}
    for key, value in dct.items()]

edited Jan 18 '19 at 07:14

answered Jan 18 '19 at 07:12

Praind

1,551
1
12
25

*composite key i guess? – Vineeth Sai Jan 18 '19 at 07:13
@Vineeth Sai Indeed ;) – Praind Jan 18 '19 at 07:14
would this work even if there are more keys than the three mentioned, i.ei if you have a new key? @Praind – KaanTheGuru Jan 18 '19 at 07:16
1

Of course, you just have to add them to the composite key. E.g. `dct[(entry['number'], entry['favorite'], entry['otherKey'])].append(entry['color'])` – Praind Jan 18 '19 at 07:19
This would not retain order (unless python3.6), as the question seems to require. – cs95 Jan 18 '19 at 09:41

U13-Forward · Answer 3 · 2019-01-18T08:02:54.013

2

Or groupby of itertools:

import itertools
lst = [
{'number': 1, 'favorite': False, 'color': 'red'},
{'number': 1, 'favorite': False, 'color': 'green'},
{'number': 1, 'favorite': False, 'color': 'red'},
{'number': 1, 'favorite': True, 'color': 'red'},
{'number': 2, 'favorite': False, 'color': 'red'}]
l=[list(y) for x,y in itertools.groupby(sorted(lst,key=lambda x: (x['number'],x['favorite'])),lambda x: (x['number'],x['favorite']))]
print([{k:(v if k!='color' else list(set([x['color'] for x in i]))) for k,v in i[0].items()} for i in l])

Output:

[{'number': 1, 'favorite': False, 'color': ['green', 'red']}, {'number': 1, 'favorite': True, 'color': ['red']}, {'number': 2, 'favorite': False, 'color': ['red']}]

edited Jan 18 '19 at 08:02

answered Jan 18 '19 at 07:18

U13-Forward

69,221
14
89
114

2

Do you know that your solution won't work if the input list isn't already sorted (w.r.t. number and favorite)? Like if the list contained first dict element as number=1, and favorite=False, second as number=1, and favorite=True, and then number=1, and favorite=False. – Ahmad Khan Jan 18 '19 at 07:34
2

Try with this input list: `lst = [ {'number': 1, 'favorite': False, 'color': 'red'}, {'number': 1, 'favorite': True, 'color': 'red'}, {'number': 1, 'favorite': False, 'color': 'green'}, {'number': 1, 'favorite': False, 'color': 'red'}, {'number': 2, 'favorite': False, 'color': 'red'}]` – Ahmad Khan Jan 18 '19 at 07:34

score 1 · Answer 4 · answered Jan 18 '19 at 15:25

You can use an ordered dictionary with default set values.¹ Then iterate your list of dictionaries, using (number, favorite) as keys. This works since tuples are hashable and therefore permitted to be used as dictionary keys.

It's good practice to use a consistent structure. So, instead of having strings for single values and sets for multiple, use sets throughout:

from collections import OrderedDict, defaultdict

class DefaultOrderedDict(OrderedDict):
    def __missing__(self, k):
        self[k] = set()
        return self[k]

d = DefaultOrderedDict()  # Python 3.7+: d = defaultdict(set)

for i in lst:
    d[(i['number'], i['favorite'])].add(i['color'])

res = [{'number': num, 'favorite': fav, 'color': col} for (num, fav), col in d.items()]

print(res)
# [{'color': {'green', 'red'}, 'favorite': False, 'number': 1},
#  {'color': {'red'}, 'favorite': True, 'number': 1},
#  {'color': {'red'}, 'favorite': False, 'number': 2}]

If you insist on having different types depending on number of colours, you can redefine the list comprehension to use a ternary statement:

res = [{'number': num, 'favorite': fav, 'color': next(iter(col)) if len(col) == 1 else col} \
       for (num, fav), col in d.items()]

print(res)
# [{'color': {'green', 'red'}, 'favorite': False, 'number': 1},
#  {'color': 'red', 'favorite': True, 'number': 1},
#  {'color': 'red', 'favorite': False, 'number': 2}]

¹ The point is noteworthy in Python versions prior to 3.7, where dictionaries are not guaranteed to be insertion ordered. With Python 3.7+, you can take advantage of insertion ordering and just use dict or a subclass of dict such as collections.defaultdict.

score 0 · Answer 5 · answered Jan 18 '19 at 07:09

Here's one way to do it,

I've built a dict first using a tuple as a composite key, Then made a new list out of that dict. You can write comprehensions to further reduce lines and optimize it, Hope it helps.

new_dict = {}

for item in lst:
    try: # if already exists then append to the list
        new_dict.get((item['number'], item['favorite']))
        new_dict[(item['number'], item['favorite'])].append(item['color'])
    except KeyError: # if it doesn't then create a new entry to that key
        new_dict[(item['number'], item['favorite'])] = [item['color']]


final_list = []
for k, v in new_dict.items(): # keep appending dicts to our list
    final_list.append({'number': k[0], 'favorite': k[1], 'color':set(v)})

print(final_list)

Outputs:

[{'number': 1, 'favorite': False, 'color': {'green', 'red'}}, {'number': 1, 'favorite': True, 'color': {'red'}}, {'number': 2, 'favorite': False, 'color': {'red'}}]

score 0 · Answer 6 · answered Jan 22 '19 at 16:00

A friend of mine made the following function to solve this problem, without using any external libraries:

def uniqifyColors(l):
    for elem in l:
        for item in l:
            if elem['number'] == item['number'] and elem['favorite'] == item['favorite']:
                for clr in item['color']:
                    if clr not in elem['color']:
                        elem['color'].append(clr)
    return l

After using this Python function, he simply did a trivial uniqify to get the unique results from the list. It does not, however, keep a single color as a string, but rather a list with a single element.

Group and aggregate a list of dictionaries by multiple keys

6 Answers6

Linked