Selective Re-Memoization of DataFrames

Question

Say I setup memoization with Joblib as follows (using the solution provided here):

from tempfile import mkdtemp
cachedir = mkdtemp()

from joblib import Memory
memory = Memory(cachedir=cachedir, verbose=0)

@memory.cache
def run_my_query(my_query)
    ...
    return df

And say I define a couple of queries, query_1 and query_2, both of them take a long time to run.

I understand that, with the code as it is:

The second call with either query, would use the memoized output, i.e:

run_my_query(query_1)
run_my_query(query_1) # <- Uses cached output

run_my_query(query_2)
run_my_query(query_2) # <- Uses cached output

I could use memory.clear() to delete the entire cache directory

But what if I want to re-do the memoization for only one of the queries (e.g. query_2) without forcing a delete on the other query?

seems that [`.call`](https://pythonhosted.org/joblib/memory.html#joblib.memory.MemorizedFunc.call) _forces_ the computation. you may want to check if it updates the cache as well. — behzad.nouri, Sep 23 '14 at 15:04
@behzad.nouri great pointer! I was hoping for a method like that. I looked for the keyword `force` but didn't find anything. `.call` could be the answer. I will check it. — Amelio Vazquez-Reina, Sep 23 '14 at 15:06
@behzad.nouri Looking at the [code](https://github.com/joblib/joblib/blob/master/joblib/memory.py#L665-L682), it seems to call `persist_output` so I think it will do the trick! — Amelio Vazquez-Reina, Sep 23 '14 at 15:10

falsetru · Accepted Answer · 2014-09-23T15:11:42.937

It seems like the library does not support partial erase of the cache.

You can separate the cache, functino into two pairs:

from tempfile import mkdtemp
from joblib import Memory

memory1 = Memory(cachedir=mkdtemp(), verbose=0)
memory2 = Memory(cachedir=mkdtemp(), verbose=0)

@memory1.cache
def run_my_query1()
    # run query_1
    return df

@memory2.cache
def run_my_query2()
    # run query_2
    return df

Now, you can selectively clear the cache:

memory2.clear()

UPDATE after seeing behzad.nouri's comment:

You can use call method of decorated function. But as you can see in the following example, the return value is different from the normal call. You should take care of it.

>>> import tempfile
>>> import joblib
>>> memory = joblib.Memory(cachedir=tempfile.mkdtemp(), verbose=0)
>>> @memory.cache
... def run(x):
...     print('called with {}'.format(x))  # for debug
...     return x
...
>>> run(1)
called with 1
1
>>> run(2)
called with 2
2
>>> run(3)
called with 3
3
>>> run(2)  # Cached
2
>>> run.call(2)  # Force call of the original function
called with 2
(2, {'duration': 0.0011069774627685547, 'input_args': {'x': '2'}})

I am hoping to use run dozens of queries, so this may not scale, but thanks, that's helpful. — Amelio Vazquez-Reina, Sep 23 '14 at 15:05
@user815423426, After seeing behzad.nouri's comment, I updated the answer. — falsetru, Sep 23 '14 at 15:12
[This comment](https://twitter.com/GaelVaroquaux/status/562735568346689536) from the author adds more light to this. — Amelio Vazquez-Reina, Feb 04 '15 at 00:05

score 1 · Answer 2 · answered Aug 31 '21 at 14:45

It's been a few years, but if your code allows you to refactor into separate functions, you can easily call func.clear() to selectively remove that function from the cache.

Example code:

#!/usr/bin/env python

import sys
from shutil import rmtree

import joblib

cachedir = "joblib-cache"
memory = joblib.Memory(cachedir)


@memory.cache
def foo():
    print("running foo")
    return 42


@memory.cache
def oof():
    print("running oof")
    return 24


def main():
    rmtree(cachedir)

    print(f"{sys.version=}")
    print(f"{joblib.__version__=}")

    print(foo())
    print(oof())
    print()

    print("*" * 20 + " These should now be cached " + "*" * 20)
    print(foo())
    print(oof())
    print()

    foo.clear()
    print("*" * 20 + " `foo` should now be recaculated " + "*" * 20)
    print(foo())
    print(oof())


if __name__ == "__main__":
    main()

Output:

sys.version='3.9.6 (default, Jun 30 2021, 10:22:16) \n[GCC 11.1.0]'
joblib.__version__='1.0.1'
________________________________________________________________________________
[Memory] Calling __main__--tmp-tmp.DaQHHlsA2H-clearcache.foo...
foo()
running foo
______________________________________________________________foo - 0.0s, 0.0min
42
________________________________________________________________________________
[Memory] Calling __main__--tmp-tmp.DaQHHlsA2H-clearcache.oof...
oof()
running oof
______________________________________________________________oof - 0.0s, 0.0min
24

******************** These should now be cached ********************
42
24

WARNING:root:[MemorizedFunc(func=<function foo at 0x7f9cd7d8e040>, location=joblib-cache/joblib)]: Clearing function cache identified by __main__--tmp-tmp/DaQHHlsA2H-clearcache/foo
******************** `foo` should now be recaculated ********************
________________________________________________________________________________
[Memory] Calling __main__--tmp-tmp.DaQHHlsA2H-clearcache.foo...
foo()
running foo
______________________________________________________________foo - 0.0s, 0.0min
42
24

Selective Re-Memoization of DataFrames

2 Answers2