2

I'm walking a data structure and would like to build a dict mapping X->Y, where X is a field in the data structure I'm walking and Y is a field in the data structure I'm building on the fly. X is an unhashable type.

Russ Weeks
  • 363
  • 1
  • 11
  • 1
    If X is unhashable then it stand to reason that it can change (Making Mapping a bit hard). Do you want to it be that that 'X' instance always points to that 'Y' instance? Or do you want any X of that value to point to that 'Y'. If it's the first one then you can assign a id to each X and the id should then be mapped to a 'Y'. If it's the second one then you can store X in a hashable container temporarily such as a tuple and use that as a key. I'm sure there are probably faster ways to do this (My pythons a bit rusty), but it should work. – Xonar Jun 11 '13 at 21:42
  • @Xonar: Pretty sure you can't hash a tuple containing unhashable items – Eric Jun 11 '13 at 21:49
  • Yes, true. (I did say my python is rusty :)) but you can recursuvely add the unhashable items into tuple. e.g. Turn (1,2,[2,3]) into (1,2,(2,3)) Thanks for pointing that out. – Xonar Jun 11 '13 at 22:16

4 Answers4

1

The purpose of Java's IdentityHashMap is to simulate dynamic field. Since Python language already supports dynamic attributes directly, you don't need the map, just assign Y to an X's attribute

x.someSuchRelation = y;
ZhongYu
  • 19,446
  • 5
  • 33
  • 61
1

You can just use a regular Python dict for this if you wrap your unhashable objects in another object. Specifically, something like this:

class Wrapper(object):
    def __init__(self, o):
        self.o = o

    def __hash__(self):
        return id(self.o)

    def __eq__(self, o):
        return hash(self) == hash(o)

Then just use it like some_dict[Wrapper(unhashable_object)].

This is a more useful approach than just using id(o) as the key if you also need to be able to access the object itself afterwards (as key.o, obviously). If you don't (and garbage collection isn't an issue), just use that.

Cairnarvon
  • 25,981
  • 9
  • 51
  • 65
  • You should not implement equality in terms of hash equality - the size of py_hash_t may be smaller than the size of a pointer. – Eric Nov 03 '20 at 09:36
1

Often, the broken solution given to this common problem is to use id. It is broken because id is only unique among existing objects, so the following can randomly happen:

>>> idmap = {}
>>> idmap[id(x)] = 42
>>> del x
>>> z = SomeObject()
>>> z in idmap
True

No explicit del statement has to happen, just adding a key to idmap inside a function could lead to the same result:

>>> def add_smthg(idmap):
>>>     x = SomeObject()
>>>     idmap[id(x)] = 42

>>> idmap = {}
>>> add_smthg(idmap)
>>> z = SomeObject()
>>> z in idmap
True

To avoid this, you have to keep a reference of each object you insert. IMHO the only viable option is to create new dictionnary / set classes:

class IdentitySet:
    def __init__(self, items=None):
        if items is None:
            items = []

        self._identities = {id(item): item for item in items}

    def add(self, item):
        self._identities[id(item)] = item
    
    def __delitem__(self, item):
        del self._identities[id(item)]

    def __contains__(self, item):
        return id(item) in self._identities


class IdentityDict:
    def __init__(self, pairs=None):
        if pairs is None:
            pairs = []

        self._identities = IdentitySet(k for k, _ in pairs)
        self._values = {id(k): v for k, v in pairs}

    def __getitem__(self, item):
        return self._values[id(item)]

    def __setitem__(self, item, value):
        self._identities.add(item)
        self._values[id(item)] = value
    
    def __delitem__(self, item):        
        del self._identities[item]
        del self._values[id(item)]
    
    def __contains__(self, item):
        return item in self._identities
agemO
  • 263
  • 2
  • 9
0

Trivially:

idmap = {}
idmap[id(x)] = y

Use the id of x as the dictionary key

Eric
  • 95,302
  • 53
  • 242
  • 374
  • 1
    Note that for custom classes, `hash(x)` defaults to `id(x)` - you'll find `id` is unnecessary here in some cases. – Eric Jun 11 '13 at 22:06
  • 3
    Note that `x` must survive via a reference elsewhere to guarantee that its `id` won’t be reused. – Davis Herring Oct 06 '18 at 01:28
  • Downvoted because as @DavisHerring said it is subtly broken, possibly leading to random errors since the id can be reused – agemO Nov 03 '20 at 09:34