2

I'm trying to group up the name key values here as a key for a dict value, and count the source value as a key for said parent key, and have the count value with as its value.

data = [
{'name':'Gill', 'source':'foo'},
{'name':'Gill', 'source':'foo'},
{'name':'Gill', 'source':'foo'},
{'name':'Gill', 'source':'bar'},
{'name':'Gill', 'source':'bar'},
{'name':'Gill', 'source':'bar'},
{'name':'Gill', 'source':'bar'},
{'name':'Gill', 'source':'bar'},
{'name':'Dave', 'source':'foo'},
{'name':'Dave', 'source':'foo'},
{'name':'Dave', 'source':'foo'},
{'name':'Dave', 'source':'foo'},
{'name':'Dave', 'source':'egg'},
{'name':'Dave', 'source':'egg'},
{'name':'Dave', 'source':'egg'},
{'name':'Dave', 'source':'egg'},
{'name':'Dave', 'source':'egg'},
{'name':'Dave', 'source':'egg'},
{'name':'Dave', 'source':'egg'}
]

How do I achieve the below output?

{'Gill': {'foo':3, 'bar':5}, 'Dave': {'foo':4, 'egg':7}}

I think it may be possible with a 1 liner...

Slopax
  • 121
  • 2
  • 12

2 Answers2

12

Use itertools.groupby to group by names, then collections.Counter to count the source categories belonging to each name:

from collections import Counter
from itertools import groupby

f = lambda x: x['name']
dct = {k: Counter(d['source'] for d in g) for k, g in groupby(data, f)}
print(dct)
# {'Gill': Counter({'bar': 5, 'foo': 3}), 'Dave': Counter({'egg': 7, 'foo': 4})}
Selcuk
  • 57,004
  • 12
  • 102
  • 110
Moses Koledoye
  • 77,341
  • 8
  • 133
  • 139
  • 2
    Of course, this assumes that the data is sorted by the `'name'` key. – vaultah Sep 21 '17 at 16:04
  • Thank you, this is great. However for my actual dataset, it has a lot more keys than 'name' and 'source' which I haven't mentioned here (I thought it'd be fine), I may need to strip it down to just the two. But the groupby(data, f) seems to create problems with it, is there a way to make this work if a 3rd key was introduced, but have it disregard said key? (I am being picky) – Slopax Sep 21 '17 at 16:10
  • @Slopax I don't see how a third key would create a problem if you don't actually need it. – Moses Koledoye Sep 21 '17 at 16:26
  • @MosesKoledoye I am mistaken, having just 2 keys as shown here in my example seems to produce different results, very strange. That's a headache for tomorrow! – Slopax Sep 21 '17 at 16:28
1

This is obviously not a one-liner, but is simple and pretty straight forward. Would work for any number of values.

results = {}
key = 'name'
for line in data:
    tracked_key = line[key]
    results.setdefault(tracked_key, {})
    for k, v in line.iteritems():
        if k == key:
            continue
        results[tracked_key].setdefault(v, 0)
        results[tracked_key][v] += 1
zujio
  • 11
  • 2