The only required property is that objects which compare equal have the same hash value.
The "compare equal" in the spec text means the result of their __eq__
methods - there is no requirement that they are the same object.
The __hash__
, thought, have to be based in the values that are used in __eq__
, not in the object's "id" - that part is incorrect in your code. For it to work, this is how it would have to be:
Just do:
...
def __eq__( self, b ):
return self.v[0] == b.v[0] and self.v[1] == b.v[1]
def __hash__( self ):
return hash((self.v[0], self.v[1]))
Does that mean that two distinct but equivalent objects cannot be used as different keys in a dict, or appear individually in a set?
Yes. This is what the spec means.
The workaround for that is to leave the default __eq__
implementation for your class to conform to the rules, and implement an alternate form of comparison that you will have to use in your code.
The most straightforward way is just to leave the default implementation of __eq__
as it is, which compares by identity, and have an arbitrary method that you use for comparison,( the idiom that code in languages that do not support operator overriding have to use anyway):
class A( object ):
...
def equals( self, b ):
return self.v[0] == b.v[0] and self.v[1] == b.v[1]
p = A( 1, 0 )
q = A( 1, 0 )
print( str( p ), str( q ) )
print( "identical?", p is q )
print( "equivalent?", p.equals(q) )
There are ways to improve a little on this - but the baseline is: __eq__
have to conform to the rules, and do an identity comparison.
One way is to have an internal associated object that works as a "namespace" which you can use for comparison:
class CompareSpace:
def __init__(self, parent):
self.parent = parent
def __eq__( self, other ):
other = other.parent if isinstance(other, type(self)) else other
return self.parent.v[0] == other.v[0] and other.v[1] == b.parent.v[1]
class A:
def __init__( self, v1, v2 ):
self.v = ( v1, v2 )
self.comp = CompareSpace(self)
def __str__( self ):
return str( self.v )
p = A( 1, 0 )
q = A( 1, 0 )
print( str( p ), str( q ) )
print( "identical?", p is q )
print( "equivalent?", p.comp == q )
print( "hashes", hash(p), hash(q) )
demonstration of brokenness
Now I will insert an example of how this breaks - I am creating a class deliberatly more broken, to ensure the problem occurs at first try. But if the problem occurs even once each 2 million times, your code will still be too broke to use in anything real, even if personal code: you will have a dictionary that is not deterministic:
class Broken:
def __init__(self, name):
self.name = name
def __hash__(self):
return id(self) % 256
def __eq__(self, other):
return True
def __repr__(self):
return self.name
In [23]: objs = [Broken(f"{i:02d}") for i in range(64)]
In [24]: print(objs)
[00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63]
In [25]: test = {}
In [26]: for obj in objs:
...: if obj not in test:
...: test[obj] = 0
...:
In [27]: print(test)
{00: 0, 01: 0, 02: 0, 11: 0}
# Or with simple, unconditional, insertion:
In [29]: test = {obj: i for i, obj in enumerate(objs)}
In [30]: test
Out[30]: {00: 57, 01: 62, 02: 63, 11: 60}
(I repeat, while your has values won't clash by themselves, internal dict code have to reduce the number in the hash to an index in to its hash table - not necessarily by module (%) - otherwise every empty dict would need 2 ** 64 empty entries, and only if all hashes were only 64bit wide)