-1

TL;DR Is there any way to create a weak reference that will call a callback upon having 1 strong reference left instead of 0?


For those who think it's an X Y problem, here's the long explanation:

I have quite a challenging issue that I'm trying to solve with my code.

Suppose we have an instance of some class Foo, and a different class Bar which references the instance as it uses it:

class Foo:  # Can be anything
    pass

class Bar:
    """I must hold the instance in order to do stuff"""
    def __init__(self, inst):
        self.inst = inst

foo_to_bar = {}
def get_bar(foo):
    """Creates Bar if one doesn't exist"""
    return foo_to_bar.setdefault(foo, Bar(foo))

# We can either have
bar = get_foobar(Foo())
# Bar must hold a strong reference to foo

# Or
foo = Foo()
bar = get_foobar(foo)
bar2 = get_foobar(foo)  # Same Bar
del bar
del bar2
bar3 = get_foobar(foo)  # Same Bar
# In this case, as long as foo exists, we want the same bar to show up,
# therefore, foo must in some way hold a strong reference back to bar

Now here's the tricky part: You can solve this issue using a circular reference, where foo references bar and bar references foo, but hey, what's the fun part in that? It will take longer to clean up, will not work in case Foo defines __slots__ and generally will be a poor solution.

Is there any way, I can create a foo_to_bar mapping that cleans upon a single reference to both foo and bar? In essence:

import weakref
foo_to_bar = weakref.WeakKeyDictionary()
# If bar is referenced only once (as the dict value) and foo is
# referenced only once (from bar.inst) their mapping will be cleared out

This way it can work perfectly as having foo outside the function makes sure bar is still there (I might require __slots__ on Foo to support __weakref__) and having bar outside the function results in foo still being there (because of the strong reference in Bar).

WeakKeyDictionary does not work beacuse {weakref.ref(inst): bar.inst} will cause circular reference.

Alternatively, is there any way to hook into the reference counting mechanism (in order to clean when both objects get to 1 reference each) without incurring significant overhead?

Bharel
  • 23,672
  • 5
  • 40
  • 80

1 Answers1

1

You are overthinking this. You don't need to track if there is just one reference left. Your mistake is to create a circular reference in the first place.

Store _BarInner objects in your cache, that have no reference to Foo instances. Upon access to the mapping, return a lightweight Bar instance that contains both the _BarInner and Foo references:

from weakref import WeakKeyDictionary
from collections.abc import Mapping


class Foo:
    pass


class Bar:
    """I must hold the instance in order to do stuff"""
    def __init__(self, inst, inner):
        self._inst = inst
        self._inner = inner

    # Access to interesting stuff is proxied on to the inner object,
    # with the instance information included *as needed*.
    @property
    def spam(self):
        self.inner.spam(self.inst)


class _BarInner:
    """The actual data you want to cache"""
    def spam(self, instance):
        # do something with instance, but *do not store any references to that
        # object on self*.


class BarMapping(Mapping):
    def __init__(self):
        self._mapping = WeakKeyDictionary()

    def __getitem__(self, inst):
        inner = self._mapping.get(inst)
        if inner is None:
            inner = self._mapping[inst] = _BarInner()
        return Bar(inst, inner)

Translating this to the bdict project linked in the comments, you can simplify things drastically:

  • Don't worry about lack of support for weak references in projects. Document that your project will only support per-instance data on types that have a __weakref__ attribute. That's enough.
  • Don't distinguish between slots and no-slots types. Always store per-instance data away from the instances. This lets you simplify your code.
  • The same goes for the 'strong' and 'autocache' flags. The flyweight should always keep a strong reference. Per-instance data should always be stored.
  • Use a single class for the descriptor return value. The ClassBoundDict type is all you need. Store the instance and owner data passed to __get__ in that object, and vary behaviour in __setitem__ accordingly.
  • Look at collections.ChainMap() to encapsulate access to the class and instance mappings for read access.
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • I guess I tried optimizing too much and turned the code a little ugly. While I did know `ChainMap` beforehand, I haven't thought of always accessing the original `BDict`. It causes a little bit of overhead but creates a clean codebase. Thanks mate, you're a great teacher. – Bharel Jun 21 '18 at 22:27