Merge/join lists of dictionaries based on a common value in Python

Question

I have two lists of dictionaries (returned as Django querysets). Each dictionary has an ID value. I'd like to merge the two into a single list of dictionaries, based on the ID value.

For example:

list_a = [{'user__name': u'Joe', 'user__id': 1},
          {'user__name': u'Bob', 'user__id': 3}]
list_b = [{'hours_worked': 25, 'user__id': 3},
          {'hours_worked': 40, 'user__id': 1}]

and I want a function to yield:

list_c = [{'user__name': u'Joe', 'user__id': 1, 'hours_worked': 40},
          {'user__name': u'Bob', 'user__id': 3, 'hours_worked': 25}]

Additional points to note:

The IDs in the lists may not be in the same order (as with the example above).
The lists will probably have the same number of elements, but I want to account for the option if they're not but keeping all the values from list_a (essentially list_a OUTER JOIN list_b USING user__id).
I've tried doing this in SQL but it's not possible since some of the values are aggregates based on some exclusions.
It's safe to assume there will only be at most one dictionary with the same user__id in each list due to the database queries used.

Many thanks for your time.

Are you sure those are tuples? The `{}` syntax is for dictionaries... — thegrinner, Dec 20 '12 at 15:08
These are not tuples, and why don't you show us the code you have right now? — Marcin, Dec 20 '12 at 15:08
Thanks. I've edited the question and replaced tuples with dictionaries. — edkay, Dec 20 '12 at 15:14

mgilson · Accepted Answer · 2012-12-20T15:32:37.527

19

I'd use itertools.groupby to group the elements:

lst = sorted(itertools.chain(list_a,list_b), key=lambda x:x['user__id'])
list_c = []
for k,v in itertools.groupby(lst, key=lambda x:x['user__id']):
    d = {}
    for dct in v:
        d.update(dct)
    list_c.append(d)
    #could also do:
    #list_c.append( dict(itertools.chain.from_iterable(dct.items() for dct in v)) )
    #although that might be a little harder to read.

If you have an aversion to lambda functions, you can always use operator.itemgetter('user__id') instead. (it's probably slightly more efficient too)

To demystify lambda/itemgetter a little bit, Note that:

def foo(x):
    return x['user__id']

is the same thing* as either of the following:

foo = operator.itemgetter('user__id')
foo = lambda x: x['user__id']

*There are a few differences, but they're not important for this problem

edited Dec 20 '12 at 15:32

answered Dec 20 '12 at 15:10

mgilson

300,191
65
633
696

[`operator.itemgetter()`](http://docs.python.org/3/library/operator.html#operator.itemgetter) might be a good call here. – Gareth Latty Dec 20 '12 at 15:11
one-liner `[dict(y for x in g for y in x.items()) for k,g in groupby(lis,key=lambda x:x['user__id'])]` – Ashwini Chaudhary Dec 20 '12 at 15:17
Great solution, but worth noting that this will trample all but the last value in the result set for the same `user_id` if there are multiple rows for that `user_id` that contain the same value key. Probably fine for this question, but could be a tricky problem if it is a concern. – Silas Ray Dec 20 '12 at 15:19
1

@sr2222 -- You're right, it will do that, but if that is a concern, then this isn't a well-posed problem (OP never said how that should be handled) :) – mgilson Dec 20 '12 at 15:20
Well, the specs he provides don't explicitly state that such a condition can't occur, but given the type of data he appears to be working with, it's probably a safe assumption that there's no duplicates. – Silas Ray Dec 20 '12 at 15:21
Wow, impressed by the number and speed of responses here. Very much appreciated. I've tried the code originally suggested by @mgilson and it works a charm. Now to do a bit more reading to fully understand how it works :) – edkay Dec 20 '12 at 15:29
@sr2222 -- Sure he doesn't specify that it can't occur (maybe it can). But he doesn't specify *how* it should be handled should the case arise. And that's not something that I think we could reasonably guess (as far as I can see it, keeping the last one is just as good of a way to handle it as anything else). – mgilson Dec 20 '12 at 15:29
@sr2222 Good shout. Thankfully for this situation there won't be any duplicate `user__id` value keys due to the db query used. – edkay Dec 20 '12 at 15:31
Sorting, grouping, and itemgetter all seem like unnecessary overhead for some dicts. – Marcin Dec 20 '12 at 15:44
@Marcin -- Maybe. `grouping` really doesn't introduce any more overhead than your simple for loop. `itemgetter` doesn't introduce much more overhead than is already present in `__getitem__`, so `sorting` is the only stage which is really "unnecessary" here. However, if OP wants to have a list at the end of the day, it's possible that having a sorted list is desirable in which case OP would need to sort your output as well. (that said, your output would be smaller, so it would be a faster sort). Anyway, yours is a nice answer. +1 to it. – mgilson Dec 20 '12 at 15:52
@mgilson Quite. It's not just computational overhead, but also simple code length and readability. – Marcin Dec 20 '12 at 15:59

Marcin · Answer 2 · 2012-12-20T16:00:52.977

from collections import defaultdict
from itertools import chain

list_a = [{'user__name': u'Joe', 'user__id': 1},
      {'user__name': u'Bob', 'user__id': 3}]
list_b = [{'hours_worked': 25, 'user__id': 3},
      {'hours_worked': 40, 'user__id': 1}]

collector = defaultdict(dict)

for collectible in chain(list_a, list_b):
    collector[collectible['user__id']].update(collectible.iteritems())

list_c = list(collector.itervalues())

As you can see, this just uses another dict to merge the existing dicts. The trick with defaultdict is that it takes out the drudgery of creating a dict for a new entry.

There is no need to group or sort these inputs. The dict takes care of all of that.

A truly bulletproof solution would catch the potential key error in case the input does not have a 'user__id' key, or use a default value to collect up all of the dicts without such a key.

Merge/join lists of dictionaries based on a common value in Python

2 Answers2

Linked