Python dict group and sum multiple values

Question

I have a set of data in the list of dict format like below:

data = [
    {'name': 'A', 'tea':5, 'coffee':6},
    {'name': 'A', 'tea':2, 'coffee':3},
    {'name': 'B', 'tea':7, 'coffee':1},
    {'name': 'B', 'tea':9, 'coffee':4},
]

I'm trying to group by 'name' and sum the 'tea' separately and 'coffee' separately

The final grouped data must be in the this format:

grouped_data = [
    {'name': 'A', 'tea':7, 'coffee':9},
    {'name': 'B', 'tea':16, 'coffee':5},
]

I tried some steps:

from collections import Counter
c = Counter()
for v in data:
    c[v['name']] += v['tea']

my_data = [{'name': name, 'tea':tea} for name, tea in c.items()]
for e in my_data:
    print e

The above step returned the following output:

{'name': 'A', 'tea':7,}
{'name': 'B', 'tea':16}

Only I can sum the key 'tea', I'm not able to get the sum for the key 'coffee', can you guys please help to solve this solution to get the grouped_data format

Why are you using a `Counter`? Why not just use a dictionary? — blacksite, Nov 01 '17 at 12:51
Look at `zip` function https://docs.python.org/3.4/library/functions.html#zip — Grynets, Nov 01 '17 at 12:52
@blacksite Thanks for your response , I just tried with that, but I cant get the solution — Ajay Kumar, Nov 01 '17 at 12:52
@cᴏʟᴅsᴘᴇᴇᴅ I tried with pandas, but I am not able to convert the datafarame to dict format that I am expecting — Ajay Kumar, Nov 01 '17 at 12:54
@AjayKumar if you're at all curious I created a function without imports — ragardner, Nov 01 '17 at 14:17

score 9 · Accepted Answer · answered Nov 01 '17 at 12:56

9

Using pandas:

df = pd.DataFrame(data)
df

   coffee name  tea
0       6    A    5
1       3    A    2
2       1    B    7
3       4    B    9


g = df.groupby('name', as_index=False).sum()
g

  name  coffee  tea
0    A       9    7
1    B       5   16

And, the final step, df.to_dict:

d = g.to_dict('r')

d
[{'coffee': 9, 'name': 'A', 'tea': 7}, {'coffee': 5, 'name': 'B', 'tea': 16}]

answered Nov 01 '17 at 12:56

cs95

379,657
97
704
746

@COLDSPEED , thank you so much for your answered, I missed the 'as_index=False' in my code, it works well. – Ajay Kumar Nov 01 '17 at 13:00
Damn... Was *thisclose* to that same result, but forgot about `as_index`! – blacksite Nov 01 '17 at 13:00

score 2 · Answer 2 · answered Nov 01 '17 at 13:07

You can try this:

data = [
{'name': 'A', 'tea':5, 'coffee':6},
{'name': 'A', 'tea':2, 'coffee':3},
{'name': 'B', 'tea':7, 'coffee':1},
{'name': 'B', 'tea':9, 'coffee':4},
]
import itertools
final_data = [(a, list(b)) for a, b in itertools.groupby([i.items() for i in data], key=lambda x:dict(x)["name"])] 
new_final_data = [{i[0][0]:sum(c[-1] for c in i if isinstance(c[-1], int)) if i[0][0] != "name" else i[0][-1] for i in zip(*b)} for a, b in final_data]

Output:

[{'tea': 7, 'coffee': 9, 'name': 'A'}, {'tea': 16, 'coffee': 5, 'name': 'B'}

blacksite · Answer 3 · 2017-11-01T13:00:17.397

1

Using pandas, this is pretty easy to do:

import pandas as pd

data = [
    {'name': 'A', 'tea':5, 'coffee':6},
    {'name': 'A', 'tea':2, 'coffee':3},
    {'name': 'B', 'tea':7, 'coffee':1},
    {'name': 'B', 'tea':9, 'coffee':4},
]

df = pd.DataFrame(data)
df.groupby(['name']).sum()

      coffee  tea
name             
A          9    7
B          5   16

Here's one way to get it into your dict format:

grouped_data = []
for idx in gb.index:
    d = {'name': idx}
    d = {**d, **{col: gb.loc[idx, col] for col in gb}}
    grouped_data.append(d)


grouped_data
Out[15]: [{'coffee': 9, 'name': 'A', 'tea': 7}, {'coffee': 5, 'name': 'B', 'tea': 16}]

But COLDSPEED got the native pandas solution with the as_index=False config...

edited Nov 01 '17 at 13:00

answered Nov 01 '17 at 12:54

blacksite

12,086
10
64
109

Thanks for your response , I too tried with pandas and came till this, but I am not able to convert the datafarame to dict format that I am expecting – Ajay Kumar Nov 01 '17 at 12:56
thanks this solutions also works – Ajay Kumar Nov 01 '17 at 13:06

karthik reddy · Answer 4 · 2017-11-01T13:05:51.243

1

Click here to see snap shot

import pandas as pd df = pd.DataFrame(data) df2=df.groupby('name').sum() df2.to_dict('r')

edited Nov 01 '17 at 13:05

answered Nov 01 '17 at 12:58

karthik reddy

479
4
12

@karthick reddy, Thanks I too tried these steps, but I am not able to convert the datafarame to dict format that I am expecting, now I got solution. – Ajay Kumar Nov 01 '17 at 13:04
Your welcome!!!! – karthik reddy Nov 01 '17 at 13:08

ragardner · Answer 5 · 2017-11-01T14:21:07.917

Here is a method I created, you can input the key you want to group by:

def group_sum(key,list_of_dicts):
    d = {}
    for dct in list_of_dicts:
        if dct[key] not in d:
            d[dct[key]] = {}
        for k,v in dct.items():
            if k != key:
                if k not in d[dct[key]]:
                    d[dct[key]][k] = v
                else:
                    d[dct[key]][k] += v
    final_list = []
    for k,v in d.items():
        temp_d = {key: k}
        for k2,v2 in v.items():
            temp_d[k2] = v2
        final_list.append(temp_d)
    return final_list


data = [
    {'name': 'A', 'tea':5, 'coffee':6},
    {'name': 'A', 'tea':2, 'coffee':3},
    {'name': 'B', 'tea':7, 'coffee':1},
    {'name': 'B', 'tea':9, 'coffee':4},
]

grouped_data = group_sum("name",data)
print (grouped_data)

result:

[{'coffee': 5, 'name': 'B', 'tea': 16}, {'coffee': 9, 'name': 'A', 'tea': 7}]

I guess this would be slower when summing thousands of dicts compared to pandas, maybe not, I don't know. It also doesn't seem to maintain order unless you use ordereddict or python 3.6

Python dict group and sum multiple values

5 Answers5

Linked