How to group a list of tuples/objects by similar index/attribute in python?

Question

Given a list

old_list = [obj_1, obj_2, obj_3, ...]

I want to create a list:

new_list = [[obj_1, obj_2], [obj_3], ...]

where obj_1.some_attr == obj_2.some_attr.

I could throw some for loops and if checks together, but this is ugly. Is there a pythonic way for this? by the way, the attributes of the objects are all strings.

Alternatively a solution for a list containing tuples (of the same length) instead of objects is appreciated, too.

_" a list containing tuples (of the same length) instead of objects "_ Does this means **a list that contains tuples all of the same length** ? If yes, what is the "attribute" on which the tuples are grouped ? - BTW, tuples are objects, aren't they ? — eyquem, Jul 06 '11 at 21:08
@eyquem: 1. Yes; 2. the tuples are grouped at a certain index. The item at the index is a string.; 3. I believe so,... but I am not certain. :-) — Aufwind, Jul 06 '11 at 21:24

score 94 · Accepted Answer · edited Aug 05 '16 at 07:51

94

defaultdict is how this is done.

While for loops are largely essential, if statements aren't.

from collections import defaultdict


groups = defaultdict(list)

for obj in old_list:
    groups[obj.some_attr].append(obj)

new_list = groups.values()

edited Aug 05 '16 at 07:51

Delgan

18,571
11
90
141

answered Jul 06 '11 at 20:01

S.Lott

384,516
81
508
779

4

This, of course, does not preserve (or respect in any way) the original order of groups. So it may or may not be what the @Druss wanted. – tjollans Jul 06 '11 at 20:10
2

@jollybox.de: "does not preserve (or respect in any way) the original order of groups" Correct. When did that become a requirement? – S.Lott Jul 06 '11 at 20:31
1

I don't know whether it's a requirement, the original question isn't clear on that. I originally read the question that way. Still, good answer. – tjollans Jul 06 '11 at 20:37
1

Just realized that if you combine the usage of a `dict` with the `itertools.groupby` answers, you don't even need to use `defaultdict`. – JAB Jul 06 '11 at 20:39
@jollybox.de: The ordering is not necessary. But thank you for forsight. :-) @S.Lott: Thanks for the great answer! – Aufwind Jul 06 '11 at 20:43
2

Should not one call `list(groups.values())` to actually return what OP wants? I mean, otherwise, if one calls `new_list[0]`, she gets `TypeError: 'dict_values' object does not support indexing` (at leats on my machine). – sup Jun 21 '15 at 11:05
Nice answer. How would one update this if instead we want to group the objects based on some threshold instead of exact equality? – mqbaka mqbaka Oct 17 '22 at 10:41

JAB · Answer 2 · 2011-07-06T20:37:02.093

Here are two cases. Both require the following imports:

import itertools
import operator

You'll be using itertools.groupby and either operator.attrgetter or operator.itemgetter.

For a situation where you're grouping by obj_1.some_attr == obj_2.some_attr:

get_attr = operator.attrgetter('some_attr')
new_list = [list(g) for k, g in itertools.groupby(sorted(old_list, key=get_attr), get_attr)]

For a[some_index] == b[some_index]:

get_item = operator.itemgetter(some_index)
new_list = [list(g) for k, g in itertools.groupby(sorted(old_list, key=get_item), get_item)]

Note that you need the sorting because itertools.groupby makes a new group when the value of the key changes.

Note that you can use this to create a dict like S.Lott's answer, but don't have to use collections.defaultdict.

Using a dictionary comprehension (only works with Python 3+, and possibly Python 2.7 but I'm not sure):

groupdict = {k: g for k, g in itertools.groupby(sorted_list, keyfunction)}

For previous versions of Python, or as a more succinct alternative:

groupdict = dict(itertools.groupby(sorted_list, keyfunction))

Artsiom Rudzenka · Answer 3 · 2011-07-06T20:23:37.137

16

Think you can also try to use itertools.groupby. Please note that code below is just a sample and should be modified according to your needs:

data = [[1,2,3],[3,2,3],[1,1,1],[7,8,9],[7,7,9]]

from itertools import groupby

# for example if you need to get data grouped by each third element you can use the following code
res = [list(v) for l,v in groupby(sorted(data, key=lambda x:x[2]), lambda x: x[2])]# use third element for grouping

edited Jul 06 '11 at 20:23

answered Jul 06 '11 at 20:15

Artsiom Rudzenka

27,895
4
34
52

1

Basically my answer, but you forgot an important aspect: sorting before using `groupby`. – JAB Jul 06 '11 at 20:22
2

@JAB - your truth. Thank you for noticing me. – Artsiom Rudzenka Jul 06 '11 at 20:22
@JAB - Why sorting is required before using groupby ? – Sahil Chhabra Jul 12 '18 at 17:15
2

@SahilChhabra Read my answer, I say why. – JAB Jul 12 '18 at 17:30

SzorgosDiák · Answer 4 · 2023-05-09T22:50:33.127

Recently, I have also faced a similar issue. Thank you for the solutions provided above. I wrote a small comparison on the computation times of the above mentioned methods. In my implementation I keep the dictionary as it is nice to see the keys as well. The method with defaultdict won.

from collections import defaultdict
import time
import itertools
import pandas as pd
import random


class Person:
    def __init__(self,name,age):
        self.name=name
        self.age=age

    def __repr__(self):
        return f"Person(name='{self.name}', age={self.age})"


def method_with_dict(people):
    groups={}
    for person in people:
        if person.age in groups:
            groups[person.age].append(person)
        else:
            groups[person.age]=[person]
    return groups


def method_with_defaultdict(people):
    groups=defaultdict(list)
    for person in people:
        groups[person.age].append(person)
    return groups


def group_by_age_with_itertools(people):
    people.sort(key=lambda x: x.age)
    groups={}
    for age,group in itertools.groupby(people,key=lambda x: x.age):
        groups[age]=list(group)
    return groups


def group_by_age_with_pandas(people):
    df=pd.DataFrame([(p.name,p.age) for p in people],columns=["Name","Age"])
    groups=df.groupby("Age")["Name"].apply(list).to_dict()
    return {k: [Person(name,k) for name in v] for k,v in groups.items()}


if __name__ == "__main__":
    num_people=1000
    min_age,max_age=18,80
    people=[Person(name=f"Person {i}",age=random.randint(min_age,max_age)) for i in
            range(num_people)]

    N=10000
    start_time=time.time()
    for i in range(N):
        result_defaultdict=method_with_defaultdict(people)
    end_time=time.time()
    print(f"method_with_defaultdict: {end_time - start_time:.6f} seconds")


    start_time=time.time()
    for i in range(N):
        result_dict=method_with_dict(people)
    end_time=time.time()
    print(f"method_with_dict: {end_time - start_time:.6f} seconds")

    start_time=time.time()
    for i in range(N):
        result_itertools=group_by_age_with_itertools(people)
    end_time=time.time()
    print(f"method_with_itertools: {end_time - start_time:.6f} seconds")

    start_time=time.time()
    for i in range(N):
        result_pandas=group_by_age_with_pandas(people)
    end_time=time.time()
    print(f"method_with_pandas: {end_time - start_time:.6f} seconds")


method_with_defaultdict: 0.954309 seconds
method_with_dict: 1.301710 seconds
method_with_itertools: 1.868009 seconds
method_with_pandas: 34.422366 seconds

How to group a list of tuples/objects by similar index/attribute in python?

4 Answers4

Linked

Related