9

In a vein similar to this question: I'm writing a package and am using knitr to write a few documents in inst/doc/. Since I'm using github to host my repo (and I intend to point to people to that repo to get the package), I'm wondering if I should be version controlling my the caches of my various documents.

I ask this question because it's unclear where cache falls in the guidelines provided by this other question (which addresses when certain file types should and shouldn't be in the .gitignore of a repo).

Can anyone shed some light on to how package developers that use knitr and git are handling their caches?

Community
  • 1
  • 1
StevieP
  • 1,569
  • 12
  • 23

1 Answers1

1

If R CMD check passes without the knitr cache, and I think it would, I wouldn't include them. In fact, I suspect R CMD check would give a note about the cache files being present in the package. I know for LaTeX files, you only want to include the .tex file in the R package and in the version control. The other required files should be automatically generated on install.

iacobus
  • 587
  • 3
  • 10
  • 1
    The issue is that my vignettes are going to include some (potentially) lengthy calculations and I want to spare people (and myself) from having to redo those simulations every time the package is to be built... – StevieP Mar 26 '14 at 06:41
  • @StevieP perhaps you could go about it in a more transparent manner, perhaps save the objects in a RData file (and have it documented)? – Roman Luštrik Mar 26 '14 at 06:46
  • @StevieP, fair point. My personal experience learning package development from an R Core Team member was that in package source files, weird non-standard things weren't well received. Looking over the extensions manual, it seems like that may have been more his preferences than something strongly enforced by CRAN. As long as it passes `R CMD check`, include whatever you need to keep the run-time reasonable. – iacobus Mar 26 '14 at 06:55
  • @iacobus Distributing simulation/lengthy calculations as package data objects is perfectly acceptable and won't get frowned upon by CRAN. As long as the data objects (results of the calculations) aren't too large in terms of disk space & your package won't change too often (CRAN keeps archived tarballs of the package sources for all versions submitted to CRAN). If the latter is not true, put the vignette calculations in a separate data-only package and have your main package depend on this package via `Suggests:` in `DESCRIPTION`. – Gavin Simpson Mar 26 '14 at 15:19
  • @RomanLuštrik It seems like this solution is the "six of one" to the "half of dozen" that is packaging the cache. At the end of the day, I'm striving for reproducible research. Meaning, (provided RNG seeds) anyone, anywhere, can push the compile button and research the same results as me. The problem with RData files is that (by themselves) they don't really allow replication... Am I making mountains from molehills, here? – StevieP Mar 26 '14 at 17:23
  • @StevieP, aren't vignettes typically stored in PDFs in the package? Recompiling the package (and therefore the vignettes) would certainly benefit from keeping the cache, otherwise your overall efficiency isn't improved a lot. If you are including any `demo`s, though, that's different. – r2evans Apr 22 '14 at 16:19