Grouping Python tuple list

Question

I have a list of (label, count) tuples like this:

[('grape', 100), ('grape', 3), ('apple', 15), ('apple', 10), ('apple', 4), ('banana', 3)]

From that I want to sum all values with the same label (same labels always adjacent) and return a list in the same label order:

[('grape', 103), ('apple', 29), ('banana', 3)]

I know I could solve it with something like:

def group(l):
    result = []
    if l:
        this_label = l[0][0]
        this_count = 0
        for label, count in l:
            if label != this_label:
                result.append((this_label, this_count))
                this_label = label
                this_count = 0
            this_count += count
        result.append((this_label, this_count))
    return result

But is there a more Pythonic / elegant / efficient way to do this?

score 44 · Accepted Answer · edited Mar 18 '21 at 18:17

44

itertools.groupby can do what you want:

import itertools
import operator

L = [('grape', 100), ('grape', 3), ('apple', 15), ('apple', 10),
     ('apple', 4), ('banana', 3)]

def accumulate(l):
    it = itertools.groupby(l, operator.itemgetter(0))
    for key, subiter in it:
       yield key, sum(item[1] for item in subiter) 

print(list(accumulate(L)))
# [('grape', 103), ('apple', 29), ('banana', 3)]

edited Mar 18 '21 at 18:17

Tomerikoo

18,379
16
47
61

answered Feb 12 '10 at 01:26

Thomas Wouters

130,178
23
148
122

5

I like the use of `operator.itemgetter` in place of `lambda`. – jathanism Feb 12 '10 at 01:48
9

This requires the list to be sorted on the first key. If it isn't already sorted, then the defaultdict approach from ghostdog74 is a much better solution. – Martijn Pieters Oct 10 '16 at 21:05
1

Why would you use `operator` instead of `lambda`? – Adrian Guerra May 29 '20 at 17:53

cobbal · Answer 2 · 2010-02-12T01:31:26.967

8

using itertools and list comprehensions

import itertools

[(key, sum(num for _, num in value))
    for key, value in itertools.groupby(l, lambda x: x[0])]

Edit: as gnibbler pointed out: if l isn't already sorted replace it with sorted(l).

edited Feb 12 '10 at 01:31

answered Feb 12 '10 at 01:25

cobbal

69,903
20
143
156

5

to use groupby you must first ensure that the sequence is pregrouped (all the 'grape' adjacent, etc). one way to do that is to sort the sequence first – John La Rooy Feb 12 '10 at 01:30
@Thomas Wouters, yes you are correct ("same labels are always adjacent") – John La Rooy Feb 12 '10 at 01:40

score 6 · Answer 3 · answered Feb 12 '10 at 01:45

6

import collections
d=collections.defaultdict(int)
a=[]
alist=[('grape', 100), ('banana', 3), ('apple', 10), ('apple', 4), ('grape', 3), ('apple', 15)]
for fruit,number in alist:
    if not fruit in a: a.append(fruit)
    d[fruit]+=number
for f in a:
    print (f,d[f])

output

$ ./python.py
('grape', 103)
('banana', 3)
('apple', 29)

answered Feb 12 '10 at 01:45

ghostdog74

327,991
56
259
343

This does search in `alist` for each item which makes your algorithm `O(n^2)` not a good thing. – Shital Shah May 18 '19 at 01:54

score 5 · Answer 4 · answered Feb 12 '10 at 01:49

>>> from itertools import groupby
>>> from operator import itemgetter
>>> L=[('grape', 100), ('grape', 3), ('apple', 15), ('apple', 10), ('apple', 4), ('banana', 3)]
>>> [(x,sum(map(itemgetter(1),y))) for x,y in groupby(L, itemgetter(0))]
[('grape', 103), ('apple', 29), ('banana', 3)]

score 4 · Answer 5 · answered Apr 19 '17 at 12:51

4

my version without itertools
[(k, sum([y for (x,y) in l if x == k])) for k in dict(l).keys()]

answered Apr 19 '17 at 12:51

Anton Suslov

71
4

Shameem · Answer 6 · 2018-06-02T10:38:18.847

1

Method

def group_by(my_list):
    result = {}
    for k, v in my_list:
        result[k] = v if k not in result else result[k] + v
    return result

Usage

my_list = [
    ('grape', 100), ('grape', 3), ('apple', 15),
    ('apple', 10), ('apple', 4), ('banana', 3)
]

group_by(my_list) 

# Output: {'grape': 103, 'apple': 29, 'banana': 3}

You Convert to List of tuples like list(group_by(my_list).items()).

edited Jun 02 '18 at 10:38

answered May 16 '18 at 07:12

Shameem

2,664
17
21

score 0 · Answer 7 · answered Jul 10 '16 at 18:29

Or a simpler more readable answer ( without itertools ):

pairs = [('foo',1),('bar',2),('foo',2),('bar',3)]

def sum_pairs(pairs):
  sums = {}
  for pair in pairs:
    sums.setdefault(pair[0], 0)
    sums[pair[0]] += pair[1]
  return sums.items()

print sum_pairs(pairs)

score 0 · Answer 8 · edited Jul 08 '22 at 14:45

0

Simpler answer without any third-party libraries:

dct={}

for key,value in alist:
    if key not in dct:
        dct[key]=value
    else:
        dct[key]+=value

edited Jul 08 '22 at 14:45

ChrisGPT was on strike

127,765
105
273
257

answered Jul 08 '22 at 09:47

Md Shahbaz

1
1

I don't see any third-party libraries here. [`itertools`](https://docs.python.org/library/itertools.html), [`operator`](https://docs.python.org/library/operator.html), and [`collections`](https://docs.python.org/library/collections.html) are all part of the standard library. They come with Python. – ChrisGPT was on strike Jul 08 '22 at 14:47

Grouping Python tuple list

8 Answers8

Linked

Related