2

I've run a plan to create a large set of objects in drake's cache. Now, outside of a plan, I ran lapply over a subset of those objects so I can summarize some of their properties and plan my next steps.

I'm using readd to load each one of these cached objects inside of the function I'm applying over, but they seem to still use up RAM after I'm done with them. That's a problem in my scenario because it's 100 GiB of RAM by the time it's finished. I'm not sure where in the environment I should be looking for them if I need to explicitly remove them.

I understand that drake is doing something similar to memoization with the cache, since if I readd the same object twice, the first one takes time to read from disk, and the second time is instantaneous. But in this scenario, I'd like to treat the cache as a simple data source like any other file, so that an object doesn't take up RAM if it's rm()'d or goes outside of the scope.

rushgeo
  • 103
  • 6

1 Answers1

1

Figured it out! It looks like the storr object returned by get_cache or new_cache has a flush_cache method. Calling that, then gc(), returns the memory.

Should flush_cache be documented somewhere in drake, even though it comes from storr?

I also found that if I call readd from multiple processes with mclapply, the objects don't stay in RAM, since they don't get transferred back to the main process.

rushgeo
  • 103
  • 6
  • 1
    Might be worth asking on the [drake github page](https://github.com/ropensci/drake) if you think the docs need updating, or have a suggestion for improvements. – SymbolixAU May 31 '19 at 00:10
  • 2
    Glad you figured this out before I got to it. The main issue in `drake` should now be fixed in https://github.com/ropensci/drake/commit/6a05b979283f79a2ea8444020c5450ad2b9611d0 and https://github.com/ropensci/drake/commit/bf88250f7ea63b545a421f08433ec64c74ec79e8. I just set `use_cache` to `FALSE` in the`storr` function calls that save and load targets. `drake` has its own memory management strategy (see the `memory_management` and `garbage_collection` arguments of `make()`) so we do not need `storr`'s in-memory cache anyway. So we probably don't need more docs at this point. – landau May 31 '19 at 00:23
  • 1
    Major update: the manual has a new chapter on memory management: https://ropenscilabs.github.io/drake-manual/memory.html. Some of the features are only available in the newly-released `drake` version 7.4.0. – landau Jun 11 '19 at 11:30