Python list group by key

Question

I have a list :

x = [[17, 2], [18, 4], [17, 2], [18, 0], [19, 4],
     [17, 4], [19, 4], [17, 4], [20, 4], [17, 4],
     [20, 4], [17, 4], [17, 4], [18, 4], [17, 4]]

I'd like to sum all second value when the first is the same.

Ex : 17 = 28...

I try to make a dict with :

d = {}
for row in x:
    if row[0] not in d:
        d[row[0]] = []
    d[row[0]].append(row[1])

The result is

{17: [2, 2, 4, 4, 4, 4, 4, 4],
 18: [4, 0, 4], 19: [4, 4],
 20: [4, 4]}

I didn't find a way to sum the values.

Since you are interested in the sum and not in the list, replace `d[row[0]] = []` with `d[row[0]] = 0` and `d[row[0]].append(row[1])` with `d[row[0]] += row[1]` to actually sum the elements instead of listing them. — Stef, Jul 05 '21 at 15:02
Related question: [Reduce by key in python](https://stackoverflow.com/questions/29933189/reduce-by-key-in-python) — Stef, Jul 05 '21 at 15:09

Mad Physicist · Answer 1 · 2021-07-05T15:04:46.133

2

You can use itertools.groupby if the list is sorted (and you can use sorted to ensure that):

from itertools import groupby
from operator import itemgetter

d = {key: sum(grp) for key, grp in groupby(sorted(x, key=itemgetter(0)))}

In this case itemgetter(0) is a more efficient shortcut for lambda x: x[0].

In your original case, you could either maintain a running sum or sum afterwards. To sum the dictionary you already have:

d = {k: sum(v) for k, v in d.items()}

To maintain a running sum:

d = {}
for k, v in x:
    if k in d:
        d[k] += v
    else:
        d[k] = v

A shorter way of doing the same thing would be to use dict.setdefault:

d = {}
for k, v in x:
    d[k] = d.setdefault(k, 0) + v

edited Jul 05 '21 at 15:04

answered Jul 05 '21 at 14:37

Mad Physicist

107,652
25
181
264

Can you please share your thought, about why itemgetter(0) is more efficient than lambda? I am not questioning it, just trying to understand. I would write with lambda, because to me it's more readable and explicit that, for each inner list, I want to sort/group/filter by the first/second element. – Ahsanul Haque Jul 05 '21 at 14:42
@AhsanulHaque. Sure. `itemgetter(0)` is specifically written in C (in CPython) to make it a more efficient way of providing a sort key. I've added links to the docs for all the functions I'm recommending. – Mad Physicist Jul 05 '21 at 14:47

score 1 · Answer 2 · answered Jul 05 '21 at 14:55

1

Using defaultdict:

>>> from collections import defaultdict
>>> d = defaultdict(int)
>>> for lst in x:
...   a, b = lst
...   d[a] +=  b
>>> d
defaultdict(int, {17: 28, 18: 8, 19: 8, 20: 8})

answered Jul 05 '21 at 14:55

BioGeek

21,897
23
83
145

1

You can just do `for a, b in x:` – Mad Physicist Jul 05 '21 at 15:00

Marcel Preda · Answer 3 · 2021-07-05T15:09:05.447

0

Here we are:

x=[[17, 2], [18, 4], [17, 2], [18, 0], [19, 4], [17, 4], [19, 4], [17, 4], [20, 4], [17, 4], [20, 4], [17, 4], [17, 4], [18, 4], [17, 4]]
d = {}
for row in x:
    if row[0] not in d:
        d[row[0]] = 0
    d[row[0]] += row[1]

print(d)

and the output is

{17: 28, 18: 8, 19: 8, 20: 8}

edited Jul 05 '21 at 15:09

answered Jul 05 '21 at 14:37

Marcel Preda

1,045
4
18

1

`d[row[0]] = row[1]` should be `d[row[0]] = 0` – Mad Physicist Jul 05 '21 at 14:40
You're adding the first element from each group twice otherwise. You can trivially see this for `20`: `4 + 4 = 8` but you're reporting `4 + 4 + 4 = 12` – Mad Physicist Jul 05 '21 at 14:48

Python list group by key

3 Answers3