9

Say I have a function that runs a SQL query and returns a dataframe:

import pandas.io.sql as psql
import sqlalchemy

query_string = "select a from table;"

def run_my_query(my_query):
    # username, host, port and database are hard-coded here
    engine = sqlalchemy.create_engine('postgresql://{username}@{host}:{port}/{database}'.format(username=username, host=host, port=port, database=database))

    df = psql.read_sql(my_query, engine)
    return df

# Run the query (this is what I want to memoize)
df = run_my_query(my_query)

I would like to:

  1. Be able to memoize my query above with one cache entry per value of query_string (i.e. per query)
  2. Be able to force a cache reset on demand (e.g. based on some flag), e.g. so that I can update my cache if I think that the database has changed.

How can I do this with joblib, jug?

Amelio Vazquez-Reina
  • 91,494
  • 132
  • 359
  • 564

1 Answers1

5

Yes, you can do this with joblib (this example basically pastes itself):

>>> from tempfile import mkdtemp
>>> cachedir = mkdtemp()

>>> from joblib import Memory
>>> memory = Memory(cachedir=cachedir, verbose=0)

>>> @memory.cache
... def run_my_query(my_query)
...     ...
...     return df

You can clear the cache using memory.clear().


Note you could also use lru_cache or even "manually" with a simple dict:

def run_my_query(my_query, cache={})
    if my_query in cache:
        return cache[my_query]
    ...
    cache[my_query] = df
    return df

You could clear the cache with run_my_query.func_defaults[0].clear() (not sure I'd recommend this though, just thought it was a fun example).

Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
  • Thanks @Andy. Is there any way to use this to memoize a sequence of Python statements without having to put them inside of a function first? If the statements modify several variables, it requires quite a bit of work to wrap everything into a function just to memoize the computation. – Amelio Vazquez-Reina Aug 29 '14 at 17:33
  • memoising using not a valid strategy when mutating variables... perhaps sep question to explain what you're attempting? – Andy Hayden Aug 29 '14 at 20:13
  • ^may not be a valid strategy – Andy Hayden Aug 29 '14 at 20:50