1

I'm trying to memoize a method foo(dti: DatetimeIndex) using the @functools.lru_cache() annotation. However, it complains with TypeError: unhashable type: 'DatetimeIndex'.

Since DatetimeIndex objects are immutable, there should be a good way to use them as a key for memoization, right?

Also, what would be wrong with DatetimeIndex defining a hash method to simply return its id()?

Ken Williams
  • 22,756
  • 10
  • 85
  • 147
  • Not "memorize", memoize. See https://en.wikipedia.org/wiki/Memoization . – Ken Williams May 18 '17 at 20:55
  • I want to cache it simply because it's expensive to calculate `foo`. In my case it's an intrinsically expensive operation, there's little I can do to make it faster. – Ken Williams May 18 '17 at 21:00
  • Yes, it is, otherwise I wouldn't be looking for a solution. I'm not sure why you doubt it. For an example: suppose the user gets to interactively input arbitrary integers and we want to prime factorize each one. There's no way to avoid re-work unless you cache the output of previous factorizations. This is a situation like that - I don't know what `DatetimeIndex` values I'm going to see, but I want to avoid redoing work I've already done recently. – Ken Williams May 18 '17 at 21:06
  • No, it needs to see the entire `DatetimeIndex` object. It's a function of the entire object. – Ken Williams May 18 '17 at 21:11
  • I'm sorry @MaxU, but this is already the exact question I want to ask, and I really don't get why you think it's ill-defined or shouldn't be asked. The question is about memoizing and hashing of `DatetimeIndex` objects, not whether I can optimize some other calling code that's not up for discussion. – Ken Williams May 18 '17 at 21:35
  • ok, sorry for that - i have deleted all my "off-topic" questions! good luck! – MaxU - stand with Ukraine May 18 '17 at 21:37

1 Answers1

-1

I ended up writing my own decorator to be able to memoize methods that accept DataFrame objects (or Hashable objects, in case DataFrames get hashable in the future), it looks like this:

def my_memoize(func):
    # TODO use an LRU cache so we don't grow forever
    cache = {}

    def decorating_function(self, arg):
        key = arg if isinstance(arg, collections.Hashable) else id(arg)
        key = (self, key)
        if key in cache:
            return cache[key]
        value = func(self, arg)
        cache[key] = value
        return value

    return decorating_function

I use it like so:

@my_memoize
def calculate_stuff(self, df):
    ...
Ken Williams
  • 22,756
  • 10
  • 85
  • 147
  • This will return an incorrect result if a DatetimeIndex is deallocated and a new, non-equivalent one is subsequently allocated at same memory address. The problem with using id() as part of a cache key is that it carries no guarantee of uniqueness for objects with non-overlapping lifetimes. – TJM Nov 01 '20 at 17:44