use custom comparator for a specific set

Question

I am storing a number of objects in a set. Is there a way to override the comparator function used just for that set? I know I can override __eq__ and friends but I don't want to do so as I am also storing those objects in other sets.

Illustration:

# suppose Person class has name and address fields
p1 = Person("Alice", "addr1")
p2 = Person("Alice", "addr2")
s1 = set(p1, p2, [equality based on name only])  # this should contain only one of p1 or p2
s2 = set(p1, p2)  # this should contain p1 and p2

Do you care which of `p1` or `p2` is chosen? You can use `itertools.groupby` to group the items by name, then take the first element of each group. — chepner, Jun 19 '21 at 18:20
You could subclass `set` but that's not a great idea. Better to use a ***set comprehension***. The `itertools.groupby` suggestion is good. — smci, Jun 19 '21 at 18:41
Related: [Set comprehension and different comparable relations](https://stackoverflow.com/questions/44190339/set-comprehension-and-different-comparable-relations) — smci, Jun 19 '21 at 18:43
Can we assume your `Person` object has at most two attributes, hence the dict hack approach is acceptable? (IMO that's hacky and limiting) — smci, Jun 19 '21 at 18:47

chepner · Answer 1 · 2021-06-19T18:33:54.697

set doesn't provide a way to determine whether two objects are equivalent; it leaves that up to the class.

However, you can group the objects by an arbitrary predicate, then construct an appropriate sequence to pass to set.

Here's a solution using itertools.groupby:

from itertools import groupby

def get_name(p):
    return p.name  # or however you get the name of a Person instance


s1 = set(next(v) for _, v in groupby(sorted([p1, p2], key=get_name), get_name))

After sorting by name, groupby will put all Persons with the same name in a single group. Iterating over the resulting sequence yields tuples like ("Alice", <sequence of Person...>). You can ignore the key, and just call next on the sequence to get an object with the name Alice.

Note that depending on how you do the grouping, "equal" elements can still end up in the different groups, and set will discard the duplicates as usual.

score 0 · Answer 2 · answered Jun 19 '21 at 18:41

You can do this using a dictionary (which kind of uses a set under-the-hood):

# Just to simulate a class, not really necessary
import collections
Person = collections.namedtuple('Person', ('name', 'address'))

people = [Person("Alice", "addr1"), Person("Alice", "addr2"), Person("Bob", "addr1")]
s = set({person.name: person for person in people}.values())

print(s)
# Output: {Person(name='Bob', address='addr1'), Person(name='Alice', address='addr2')}

use custom comparator for a specific set

2 Answers2