0

I have a list of tuples, lets say:

durations = [(1, 5), (2, 3), (1, 6), (3, 1), (3, 12), (7, 8)]

And want to perform a groupby operation on it, which returns the sums of the second tuple element for each tuple in the list. My desired output would look like this (preferably sorted descending by the second tuple element):

[(3, 13), (1, 11), (7, 8), (2, 3)]

However, when I tried the solution from Grouping Python tuple list:

import itertools
import operator

def accumulate(l):
    it = itertools.groupby(l, operator.itemgetter(0))
    for key, subiter in it:
       yield key, sum(item[1] for item in subiter)

durations = [(1, 5), (2, 3), (1, 6), (3, 1), (3, 12), (7, 8)]

output = list(accumulate(durations))

I keep getting wrong output:

[(1, 5), (2, 3), (1, 6), (3, 13), (7, 8)]

(In this example it does not add up (1, 5) and (1, 6) to (1, 11). Running this code on larger data gives a lot of such missers)

How to solve this?

Peter
  • 722
  • 6
  • 24
  • 1
    You missed [this comment](https://stackoverflow.com/questions/2249036/grouping-python-tuple-list#comment67213367_2249060) by Martin: *"__This requires the list to be sorted on the first key__. If it isn't already sorted, then the defaultdict approach from ghostdog74 is a much better solution"* – Tomerikoo Mar 18 '21 at 18:16
  • Yes, this worked, thanks! – Peter Mar 18 '21 at 18:18
  • This is also mentioned [in the docs](https://docs.python.org/3.7/library/itertools.html#itertools.groupby): *"Generally, the iterable needs to already be sorted on the same key function."* – Tomerikoo Mar 18 '21 at 18:19
  • Yes, thanks for the comment! – Peter Mar 18 '21 at 18:35

1 Answers1

1

Probably the easiest way is to use a dict as an accumulator

from collections import defaultdict

def accumulate(l):
    d = defaultdict(int)
    for x in l:
       d[x[0]] += x[1]
    return d.items()

durations = [(1, 5), (2, 3), (1, 6), (3, 1), (3, 12), (7, 8)]

print(list(accumulate(durations)))
Mihai
  • 831
  • 6
  • 13