8

I am trying to write a memoization library that uses shelve to store the return values persistently. If I have memoized functions calling other memoized functions, I am wondering about how to correctly open the shelf file.

import shelve
import functools


def cache(filename):
    def decorating_function(user_function):
        def wrapper(*args, **kwds):
            key = str(hash(functools._make_key(args, kwds, typed=False)))
            with shelve.open(filename, writeback=True) as cache:
                if key in cache:
                    return cache[key]
                else:
                    result = user_function(*args, **kwds)
                    cache[key] = result
                    return result

        return functools.update_wrapper(wrapper, user_function)

    return decorating_function


@cache(filename='cache')
def expensive_calculation():
    print('inside function')
    return


@cache(filename='cache')
def other_expensive_calculation():
    print('outside function')
    return expensive_calculation()

other_expensive_calculation()

Except this doesn't work

$ python3 shelve_test.py
outside function
Traceback (most recent call last):
  File "shelve_test.py", line 33, in <module>
    other_expensive_calculation()
  File "shelve_test.py", line 13, in wrapper
    result = user_function(*args, **kwds)
  File "shelve_test.py", line 31, in other_expensive_calculation
    return expensive_calculation()
  File "shelve_test.py", line 9, in wrapper
    with shelve.open(filename, writeback=True) as cache:
  File "/usr/local/Cellar/python3/3.4.1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/shelve.py", line 239, in open
    return DbfilenameShelf(filename, flag, protocol, writeback)
  File "/usr/local/Cellar/python3/3.4.1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/shelve.py", line 223, in __init__
    Shelf.__init__(self, dbm.open(filename, flag), protocol, writeback)
  File "/usr/local/Cellar/python3/3.4.1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/dbm/__init__.py", line 94, in open
    return mod.open(file, flag, mode)
_gdbm.error: [Errno 35] Resource temporarily unavailable

What you recommend for a solution to this sort of problem.

saul.shanabrook
  • 3,068
  • 3
  • 31
  • 49
  • 6
    I think you should not have two open writing pointers to the same file.. that will almost certainly lead to undesired behaviour ... instead use `file.seek(0)` if you want to go back to the beginning of an open file – Joran Beasley Jul 24 '14 at 16:43
  • OK, makes sense, but I don't really wanna go back to the beginning of any files. I basically want to the second `open` to use the already opened file of the first, if it has already been opened, if not then open it. – saul.shanabrook Jul 24 '14 at 16:48
  • its obviously still open since you are still within its context block unless you explicitly closed it somewhere – Joran Beasley Jul 24 '14 at 16:50
  • @saul.shanabrook It's tough to tell you how you should do this without a better idea of how your library is organized. Are the memoization functions you're talking about all part of the same class? – dano Jul 24 '14 at 16:53
  • @dano I updated my question with a specific example. Does it make sense? – saul.shanabrook Jul 24 '14 at 17:03
  • 1
    @dano updated again, with working (except not) example – saul.shanabrook Jul 24 '14 at 17:13
  • 1
    Given your updated example, the real question is not "*can `open` be called in a nested fashion*?", but rather, "*can `shelve.open` be called in a nested fashion*?". – Robᵩ Jul 24 '14 at 17:30

3 Answers3

5

No, you may not have nested shelve instances with the same filename.

The shelve module does not support concurrent read/write access to shelved objects. (Multiple simultaneous read accesses are safe.) When a program has a shelf open for writing, no other program should have it open for reading or writing. Unix file locking can be used to solve this, but this differs across Unix versions and requires knowledge about the database implementation used.

https://docs.python.org/3/library/shelve.html#restrictions

Robᵩ
  • 163,533
  • 20
  • 239
  • 308
2

Rather than trying to nest calls to open (which as you have discovered, does not work), you could make your decorator maintain a reference to the handle returned by shelve.open, and then if it exists and is still open, re-use that for subsequent calls:

import shelve
import functools

def _check_cache(cache_, key, func, args, kwargs):
    if key in cache_:
        print("Using cached results")
        return cache_[key]
    else:
        print("No cached results, calling function")
        result = func(*args, **kwargs)
        cache_[key] = result
        return result

def cache(filename):
    def decorating_function(user_function):
        def wrapper(*args, **kwds):
            args_key = str(hash(functools._make_key(args, kwds, typed=False)))
            func_key = '.'.join([user_function.__module__, user_function.__name__])
            key = func_key + args_key
            handle_name = "{}_handle".format(filename)
            if (hasattr(cache, handle_name) and
                not hasattr(getattr(cache, handle_name).dict, "closed")
               ):
                print("Using open handle")
                return _check_cache(getattr(cache, handle_name), key, 
                                    user_function, args, kwds)
            else:
                print("Opening handle")
                with shelve.open(filename, writeback=True) as c:
                    setattr(cache, handle_name, c)  # Save a reference to the open handle
                    return _check_cache(c, key, user_function, args, kwds)

        return functools.update_wrapper(wrapper, user_function)
    return decorating_function


@cache(filename='cache')
def expensive_calculation():
    print('inside function')
    return


@cache(filename='cache')
def other_expensive_calculation():
    print('outside function')
    return expensive_calculation()

other_expensive_calculation()
print("Again")
other_expensive_calculation()

Output:

Opening handle
No cached results, calling function
outside function
Using open handle
No cached results, calling function
inside function
Again
Opening handle
Using cached results

Edit:

You could also implement the decorator using a WeakValueDictionary, which looks a bit more readable:

from weakref import WeakValueDictionary

_handle_dict = WeakValueDictionary()
def cache(filename):
    def decorating_function(user_function):
        def wrapper(*args, **kwds):
            args_key = str(hash(functools._make_key(args, kwds, typed=False)))
            func_key = '.'.join([user_function.__module__, user_function.__name__])
            key = func_key + args_key
            handle_name = "{}_handle".format(filename)
            if handle_name in _handle_dict:
                print("Using open handle")
                return _check_cache(_handle_dict[handle_name], key, 
                                    user_function, args, kwds)
            else:
                print("Opening handle")
                with shelve.open(filename, writeback=True) as c:
                    _handle_dict[handle_name] = c
                    return _check_cache(c, key, user_function, args, kwds)

        return functools.update_wrapper(wrapper, user_function)
    return decorating_function

As soon as there are no other references to a handle, it will be deleted from the dictionary. Since our handle only goes out of scope when the outer-most call to a decorated function ends, we'll always have an entry in the dict while a handle is open, and no entry right after it closes.

dano
  • 91,354
  • 19
  • 222
  • 219
  • 1
    Doesn't this `with shelve.open(filename, writeback=True) as c:` close the shelf after that block? In which case it won't be open the next time? – saul.shanabrook Jul 24 '14 at 17:57
  • @saul.shanabrook Yes, but the decorator checks for that. with the `not hasattr(getattr(cache, handle_name).dict, "closed")` part of the `if` statement. `cache..dict` will only have a `closed` attribute if the handle is closed. If we find it, we open the handle again. – dano Jul 24 '14 at 18:00
  • @saul.shanabrook Also I just edited my answer so that the decorator supports using keeping handles for multiple cache files. And I updated the output section to reflect the output when the cache doesn't already exist. – dano Jul 24 '14 at 18:02
  • I get an error when I run this script: https://gist.github.com/saulshanabrook/e5000eebeffde91c7453 – saul.shanabrook Jul 24 '14 at 18:04
  • yep the updated version works. Thanks so much! this is perfect – saul.shanabrook Jul 24 '14 at 18:06
  • I created an updated version that (I think) is a little simpler https://gist.github.com/saulshanabrook/e62a5669fe6fb87220d6 – saul.shanabrook Jul 24 '14 at 18:18
  • @saul.shanabrook It is simpler, but it won't work if you make nested calls to methods decorated using `cache`, but with different values for `filename`. It also won't close the shelve if an uncaught exception occurs in `user_function` (which is why you use `with shelve.open(...)` in the first place). – dano Jul 24 '14 at 18:20
  • Can you comment on the thread safety of your example? I.e. will it be save to have decorated functions being called from several threads? Will they then use the same handle? And will this be save? – Marti Nito Mar 23 '15 at 12:37
  • @MartiNito It's not thread-safe as written. `shelve` [does not support concurrent read/write access to shelved objects](https://docs.python.org/2/library/shelve.html#restrictions) (though concurrent reads are ok). So you'd need to protect access to the shelve with a mutex to make sure only one thread is using it at a time. – dano Mar 23 '15 at 14:18
-1

You are opening the file twice but never actually closing it to update the file for whatever use. Use f.close () at the end.

ChuNan
  • 1,131
  • 2
  • 11
  • 27
  • Just looked it over. Youre not updating from the first part so there is nowhere for the "there" to go as it doesnt existbyet. Its like opening two windows of it – garrettparris Jul 24 '14 at 17:25