1

How can I cache my query data result with memoise?

I have tried it with my data and it has no difference at all.

this is my function,

runQuery <- function(DB, query, site) {

    # Match the pattern and replace it.
    dataQuery <- sub("SITE", as.character(site), query)

    # Store the result in data1.
    data = dbGetQuery(DB, dataQuery)

    return(data)
}

memoise_data <- memoise(runQuery)

data1 <- memoise_data(DB, dataQuery, site1)

The plot still takes 21.61824 seconds whether with memoise or not.

Any ideas?

Run
  • 54,938
  • 169
  • 450
  • 748

3 Answers3

5

You only get a speed-up on the second and subsequent invocations of the memoized function. On the first invocation, it runs the 'wrapped' function and caches its output. You should notice the difference on the second call to memoise_data().

For example:

f <- function() { 
  a <- rnorm(1e5)
  5 
} 
system.time(f())
user  system elapsed 
0.05    0.00    0.05 
mf <- memoise::memoise(f)
system.time(mf())
user  system elapsed 
0.05    0.00    0.05 
system.time(mf())
user  system elapsed 
0       0       0 
bojan
  • 363
  • 3
  • 5
2

I have had similar issues with memoise in order to cache results of a calculation where the function contains inputs of type data.table. At a guess I would say that memoise is storing function results against a key related to the function inputs and that the problem is with the key generation algorithm. That issue is either:

  1. A key cannot be formed from certain input types and so the result is never cached, or
  2. The key algorithm is creating a varying key on each call.

Either of which means you are recalculating/reprocessing on each function call. This certainly fits with what I have been witnessing. function(string) caches nicely, whereas function(data.table, string) does not.

Mark
  • 43
  • 7
1

What's the bottleneck of your code?

  • Is it the database query itself?
  • Is it the time spent on sending and receiving?

I don't think it's a good idea to cache database query though, by definition you could get different result with exactly same DB name and same query if DB updated. This is not safe to cache.

Database itself may have some cache mechanisms to speed up repeat inquiries.

dracodoc
  • 2,603
  • 1
  • 23
  • 33