4

I understandably broke cache when updating a chunk (however the result should be the same, it was cosmetic changes). However, I do not want to run the chunk again because it takes 1 week to run. How can I change the cache so that the new code thinks the cache holds?

I think I just need to change the file names in the cache folder. But I don't know what to change them to without running the code because knitr only writes the files after successful completion of the chunk.

Another motivation is that knitr cache can be invalidated when using different knitr versions. This happened to me between 1.5 and 1.5.33, the development versions. Also see here: R knitr: is it possible to use cached results across different machines?. I think if I find a solution to the above that can help with this.

Community
  • 1
  • 1
Xu Wang
  • 10,199
  • 6
  • 44
  • 78
  • a week to run, a week?! – rawr May 21 '14 at 02:21
  • @rawr yes, they are long and computer-intensive simulations with big data. It is not complicated or fancy, but just very longgg. – Xu Wang May 21 '14 at 02:25
  • 1
    this is pretty tricky, and is a reason I wouldn't use knitr's caching for something so intensive (I would load results from a separate batch run). Not easy, but I wonder if you can create a hacked version of the package that changes the code in https://github.com/yihui/knitr/blob/master/R/cache.R so you can figure out what the appropriate hash would be ... – Ben Bolker May 21 '14 at 02:30
  • can you write a script in something else for that chunk? the knitr log saves the names of the rda files – rawr May 21 '14 at 02:31
  • maybe copy the `rda` file corresponding to the chunk to somewhere else and load it manually? (and set `eval=FALSE` for the chunk) – Ben Bolker May 21 '14 at 02:42
  • @BenBolker good ideas. Yes this is possible and in fact I can access the direct output (I learned this from here: http://stackoverflow.com/questions/23721859/where-is-knitr-cached-output-stored). However, although there are other ways I am still curious how to do it this way and I think I will learn more about knitr by trying this way. So, part of this question is for curiosity because I can still recover the output. – Xu Wang May 21 '14 at 03:59
  • @BenBolker I added another motivation. When I use a different knitr version, the cache is invalidaded. I also saw this happen in http://stackoverflow.com/questions/17060838/r-knitr-is-it-possible-to-use-cached-results-across-different-machines – Xu Wang May 21 '14 at 05:22

1 Answers1

5

Using the knitr cache to store the results of a week-long simulation sounds a bit crazy susceptible to disaster.

My suggestion for a safer workflow is:

  1. Run the simulation and store the results in a file (csv, rda, whatever is suitable).

  2. Load that data inside a chunk (probably with echo = FALSE) near the start of your knitr report.

Now simulating and reporting are decoupled.

Richie Cotton
  • 118,240
  • 47
  • 247
  • 360
  • 2
    "Now simulating and reporting are decoupled" <-> now the document is no longer reproducible? – Xu Wang May 21 '14 at 15:16
  • @XuWang Sometimes I would rather be pragmatic instead of sticking to an ideal principle. In your case, I'd follow Richie Cotton's advice, and decouple the _really_ heavy computing from reporting. You can write your own code to make sure your simulation results can be appropriately updated when necessary, instead of relying on knitr's caching system. – Yihui Xie May 21 '14 at 19:49
  • @Yihui I see. I always though caching was exactly for this objective, long computationals? It does not make sense IMHO to use caching for short computation times. So the ojbective of caching is for medium? – Xu Wang May 21 '14 at 23:29
  • 1
    @XuWang Caching _is_ for that objective, but the caching system in knitr might be too sensitive. If your computation takes a week, you certainly do not want to break the cache frequently. – Yihui Xie May 22 '14 at 01:58