5

So, there exists an easy way to calculate the intersection of two sets via set.intersection(). However, I have the following problem:

class Person(Object):                    
    def __init__(self, name, age):                                                      
        self.name = name                                                                
        self.age = age                                                                  

l1 = [Person("Foo", 21), Person("Bar", 22)]                                             
l2 = [Person("Foo", 21), Person("Bar", 24)]                                             

union_list = list(set(l1).union(l2))                                           
# [Person("Foo", 21), Person("Bar", 22), Person("Bar", 24)]

(Object is a base-class provided by my ORM that implements basic __hash__ and __eq__ functionality, which essentially adds every member of the class to the hash. In other words, the __hash__ returned will be a hash of every element of the class)

At this stage, I would like to run a set intersection operation by .name only, to find, say, Person('Bar', -1).intersection(union_list) #= [Person("Bar", -1), Person("Bar", 22), Person("Bar", 24)]. (the typical .intersection() at this point would not give me anything, I can't override __hash__ or __eq__ on the Person class, as this would override the original set union (I think)

What's the best way to do this in Python 2.x?

EDIT: Note that the solution doesn't have to rely on a set. However, I need to find unions and then intersections, so it feels like this is amenable to a set (but I'm willing to accept solutions that use whatever magic you deem worthy, so long as it solves my problem!)

Ben Stott
  • 2,218
  • 17
  • 23
  • I don't understand your desired result. Could you please *explain* what the result should contain? – Sven Marnach May 30 '12 at 08:51
  • Err crap, that should be .union, not .intersection. I've updated the question -- let me know if this is clearer? – Ben Stott May 30 '12 at 08:52
  • I'm still a bit confused since the example code does not have the result you claim. – Sven Marnach May 30 '12 at 08:57
  • Your example is incorrect - the sets don't work like you think because you didn't define hashing and equality methods on your class. – interjay May 30 '12 at 08:57
  • Oh, of course -- sorry, in my actual code this is a database class and thus the ORM takes care of hashing and whatnot. I'll update my example to reflect this – Ben Stott May 30 '12 at 09:00
  • 1
    Your other examples don't make sense either. The default intersection would not give `[Person("Bar", 24)]`, it would give `l2`. And I don't understand why you expect to get `[Person("Bar", 22), Person("Bar", 24)]` from your operation. – interjay May 30 '12 at 09:05
  • That's what I would _like_ for it to give; I would like to use python's set intersection to intersect on _only_ one member of a class (when the `__hash__` and `__eq__` functions are already overridden (and I require the behaviour they already have)) – Ben Stott May 30 '12 at 09:08
  • 1
    But *why* would you expect that result? Why wouldn't `Person("Foo", 21)` be part of the result? I don't understand what your "intersect by name" operation means. – interjay May 30 '12 at 09:11
  • Right, I see where the question is ill-specified. I'll fix this (I swear I'll get this right eventually). - EDIT: Updated. – Ben Stott May 30 '12 at 09:13
  • 1
    Better, but now I don't see why `Person('Bar', -1)` is not part of the result.. – interjay May 30 '12 at 09:33
  • Which ORM is this? Couldn't you do this using your ORM/database? – Marcin May 31 '12 at 07:19

6 Answers6

7

Sounds like

>>> class Person:
...     def __init__(self, name, age):
...         self.name = name
...         self.age = age
...     def __eq__(self, other):
...         return self.name == other.name
...     def __hash__(self):
...         return hash(self.name)
...     def __str__(self):
...         return self.name
...
>>> l1 = [Person("Foo", 21), Person("Bar", 22)]
>>> l2 = [Person("Foo", 21), Person("Bar", 24)]
>>> union_list = list(set(l1).union(l2))
>>> [str(l) for l in union_list]
['Foo', 'Bar']

is what you want, since name is your unique key?

Jonas Byström
  • 25,316
  • 23
  • 100
  • 147
  • Ah, no, the ORM I'm using already provides a __eq__ and __hash__ method (and, as such, set.union() already produces 'sane' results). I'm looking for a way to do an intersection operation that *only* uses one of the class's members as the key, and as such can't override `__hash__` or `__eq__`. – Ben Stott May 30 '12 at 09:51
  • I see, then perhaps glglgl's solution would be suitable? – Jonas Byström May 30 '12 at 10:48
2

I hate answering my own questions, so I'll hold off on marking this as the 'answer' for a little while yet.

Turns out the way to do this is as follows:

import types
p = Person("Bar", -1)
new_hash_method = lambda obj: hash(obj.name)
p.__hash__ = types.MethodType(new_hash_method, p)
for i in xrange(0, len(union_list)):
    union_list[i].__hash__ = types.MethodType(new_hash_method, union_list[i])
set(union_list).intersection(p)

It's certainly dirty and it relies on types.MethodType, but it's less intensive than the best solution proposed so far (glglgl's solution) as my actual union_list can contain potentially in the order of thousands of items, so this will save me re-creating objects every time I run this intersection procedure.

Ben Stott
  • 2,218
  • 17
  • 23
  • Does this actually work though? The documentation indicates that magic methods like `__hash__` are looked up on the class, not the instance. https://docs.python.org/3/reference/datamodel.html#special-lookup – Cameron Lee Nov 18 '14 at 19:35
  • Actually, looks like it does work for old style classes, but not for new style classes: https://docs.python.org/2/reference/datamodel.html#special-method-lookup-for-old-style-classes – Cameron Lee Nov 18 '14 at 19:37
2

How about:

d1 = {p.name:p for p in l1}
d2 = {p.name:p for p in l2}

intersectnames = set(d1.keys()).intersection(d2.keys)
intersect = [d1[k] for k in intersectnames]

It might be faster to throw intersectnames at your ORM, in which case you wouldn't build dictionaries, just collect names in lists.

Marcin
  • 48,559
  • 18
  • 128
  • 201
1

You'll have to override __hash__ and the comparision methods if you want to use sets like this.

If you don't, then

Person("Foo", 21) == Person("Foo", 21)

will always be false.

If your objects are managed by an ORM, then you'll have to check how it compares objects. Usually it only looks at the objects id and comparision only works if both objects are managed. If you try to compare an object you got from the ORM with an instance you created yourself before it's persisted to the db, then they are likely to be different. Anyway, an ORM shouldn't have problems with you supplying your own comparision logic.

But if for some reasons you can't override __hash__ and __eq__, then you can't use sets for intersection and union with the original objects. You could:

  • calculate the intersection/union yourself
  • create a wrapper class which is comparable:

    class Person:                    
        def __init__(self, name, age):                                                      
            self.name = name                                                                
            self.age = age                                                                  
    
    l1 = [Person("Foo", 21), Person("Bar", 22)]                                             
    l2 = [Person("Foo", 21), Person("Bar", 24)]                                             
    
    class ComparablePerson:
        def __init__(self, person):
            self.person = person
    
        def __hash__(self):
            return hash(self.person.name) + 31*hash(self.person.age)
    
        def __eq__(self, other):
            return (self.person.name == other.person.name and
                    self.person.age == other.person.age)
        def __repr__(self):
            return "<%s - %d>" % (self.person.name, self.person.age)
    
    c1 = set(ComparablePerson(p) for p in l1)
    c2 = set(ComparablePerson(p) for p in l2)
    
    print c1
    print c2
    print c1.union(c2)
    print c2.intersection(c1)
    
mata
  • 67,110
  • 10
  • 163
  • 162
  • 1
    See my comment (on the original question); the override is already dealt with by an ORM. I'll update the question to reflect this. – Ben Stott May 30 '12 at 09:01
1

This is clunky, but...

set(p for p in union_list for q in l2 if p.name == q.name and p.age != q.age) | (set(p for p in l2 for q in union_list if p.name == q.name and p.age != q.age))
# {person(name='Bar', age=22), person(name='Bar', age=24)}
bbayles
  • 4,389
  • 1
  • 26
  • 34
1

If you want the age to be irrelevant with respect to comparing, you should override __hash__() and __eq__() in Person although you have it in your Object.

If you need this behaviour only in this (and similiar) contexts, you could create a wrapper object which contains the Person and behaves differently, such as

class PersonWrapper(Object):
    def __init__(self, person):
        self.person = person
    def __eq__(self, other):
        if hasattr(other, 'person'):
            return self.person.name == other.person.name
        else:
            return self.person.name == other.name
    def __hash__(self):
        return hash(self.person.name)

and then do

union_list = list(set(PersonWrapper(i) for i in l1).union(PersonWrapper(i) for i in l2))
# [Person("Foo", 21), Person("Bar", 22), Person("Bar", 24)]

(untested)

glglgl
  • 89,107
  • 13
  • 149
  • 217
  • The issue is I need the `__hash__` and `__eq__` methods the way they are, otherwise `.union()` won't work the way it does. – Ben Stott May 30 '12 at 09:53
  • Hmm, interesting. So there's no way to do this without reconstructing objects? I know C++ gives me the option to pass a custom comparator; Python doesn't have the same ability? – Ben Stott May 30 '12 at 10:08
  • There is a way to do so with functions like `sorted()` where you can give a `cmp` function as well as a `key` function, but not with `set`s, alas... – glglgl May 30 '12 at 10:11
  • Damn. I've added an edit to the question to point out that the solution doesn't _have_ to rely on a set, however I get the feeling this isn't going to change anything and I'm still going to have to use list comprehensions or genexps or something. – Ben Stott May 30 '12 at 10:17