Reduce by key in python

Question

I'm trying to think through the most efficient way to do this in python.

Suppose I have a list of tuples:

[('dog',12,2), ('cat',15,1), ('dog',11,1), ('cat',15,2), ('dog',10,3), ('cat',16,3)]

And suppose I have a function which takes two of these tuples and combines them:

def my_reduce(obj1, obj2):
    return (obj1[0],max(obj1[1],obj2[1]),min(obj1[2],obj2[2]))

How do I perform an efficient reduce by 'key' where the key here could be the first value, so the final result would be something like:

[('dog',12,1), ('cat',16,1)]

Hey @mgoldwasser, 2 years too late but here's another way: https://stackoverflow.com/a/48343896/5858851. BTW, I _am_ the former coworker you think I am. — pault, Jan 19 '18 at 15:08

Anzel · Answer 1 · 2015-04-29T02:47:02.103

12

Alternatively, if you have pandas installed:

import pandas as pd

l = [('dog',12,2), ('cat',15,1), ('dog',11,1), ('cat',15,2), ('dog',10,3), ('cat',16,3)]

pd.DataFrame(data=l, columns=['animal', 'm', 'n']).groupby('animal').agg({'m':'max', 'n':'min'})
Out[6]: 
         m  n
animal       
cat     16  1
dog     12  1

To get the original format:

zip(df.index, *df.values.T) # df is the result above
Out[14]: [('cat', 16, 1), ('dog', 12, 1)]

edited Apr 29 '15 at 02:47

answered Apr 29 '15 at 02:30

Anzel

19,825
5
51
52

I concure :) ... silly wim and his 0-width spaces :P – Joran Beasley Apr 29 '15 at 02:44

score 7 · Answer 2 · answered Apr 29 '15 at 02:25

I don't think reduce is a good tool for this job, because you will have to first use itertools or similar to group the list by the key. Otherwise you will be comparing cats and dogs and all hell will break loose!

Instead just a simple loop is fine:

>>> my_list = [('dog',12,2), ('cat',15,1), ('dog',11,1), ('cat',15,2)]
>>> output = {}
>>> for animal, high, low in my_list:
...     try:
...         prev_high, prev_low = output[animal]
...     except KeyError:
...         output[animal] = high, low
...     else:
...         output[animal] = max(prev_high, high), min(prev_low, low)

Then if you want the original format back:

>>> output = [(k,) + v for k, v in output.items()]
>>> output
[('dog', 12, 1), ('cat', 15, 1)]

Note this will destroy the ordering from the original list. If you want to preserve the order the keys first appear in, initialise output with an OrderedDict instead.

Stefan Pochmann · Accepted Answer · 2015-05-03T15:54:27.043

7

If you want to use your my_reduce and reduce, you can do it this way. It's fairly short, actually:

Preparation:

from itertools import groupby
from operator import itemgetter

pets = [('dog',12,2), ('cat',15,1), ('dog',11,1), ('cat',15,2), ('dog',10,3), ('cat',16,3)]

def my_reduce(obj1, obj2):
    return (obj1[0],max(obj1[1],obj2[1]),min(obj1[2],obj2[2]))

Solution:

print [reduce(my_reduce, group)
       for _, group in groupby(sorted(pets), key=itemgetter(0))]

Output:

[('cat', 16, 1), ('dog', 12, 1)]

edited May 03 '15 at 15:54

answered May 02 '15 at 08:16

Stefan Pochmann

27,593
8
44
107

May I know what's thy syntax/shorthand you used inside the print statement? It appears a function call followed by for loop and variable defined by the for loop is passed in the function call. reduce(my_reduce, group) for _, group in groupby(sorted(pets), key=itemgetter(0)) – Lee Dec 13 '17 at 05:29
@Lee That's a "list comprehension". – Stefan Pochmann Dec 13 '17 at 07:33

Joran Beasley · Answer 4 · 2015-04-29T02:34:22.520

0

if you really want to use reduce I think this works (it gives you a dict back instead of a list but meh)

def my_reduce(obj1, obj2):
    if not isinstance(obj1,dict):
        return reduce(my_reduce,[{},obj1,obj2])
    try:
        obj1[obj2[0]] = max(obj1[obj2[0]][0],obj2[1]),min(obj1[obj2[0]][1],obj2[2])
    except KeyError:
        obj1[obj2[0]] = obj2[1:]
    return obj1

my_list = [('dog',12,2), ('cat',15,1), ('dog',11,1), ('cat',15,2), ('dog',10,3), ('cat',16,3)]
print reduce(my_reduce,my_list)

I think both the other solutions are better however

edited Apr 29 '15 at 02:34

answered Apr 29 '15 at 02:25

Joran Beasley

110,522
12
160
179

No, because this would collapse everything to one element, but I want one element per key – mgoldwasser Apr 29 '15 at 02:26

Reduce by key in python

4 Answers4

Linked