Say, I have an external R script external.R
:
df.rand <- data.frame(rnorm(n = 100), rnorm(n = 100))
Then there's a main.Rmd
:
\documentclass{article}
\begin{document}
<<setup, include = FALSE>>=
library(knitr)
library(ggplot2)
# global chunk options
opts_chunk$set(cache=TRUE, autodep=TRUE, concordance=TRUE, progress=TRUE, cache.extra = tools::md5sum("external.r"))
@
<<source, include=FALSE>>=
source("external.R")
@
<<plot>>=
ggplot(data = df.rand, mapping = aes(x = x, y = y)) + geom_point()
@
\end{document}
It's helpful to have this in an external script, because in reality, it's a bunch of import, data cleaning and simulation tasks that would pollute the main.Rmd
.
Any chunks in main.Rmd
depend on changes in the external script.
To account for this dependency I added the above cache.extra = tools::md5sum("external.r")
.
That seems to work ok.
I'm looking for best practices.
- Is this robust (enough)?
- Is there a more elegant way to do this? (For example, it's unfortunate that any change in
external.R
will trigger a complete cache invalidation, rather than just invalidating only those objects that actually change).
There are no side effects (except for some library()
calls, but I can move them to main.Rmd
).
I'm always worried that I'm somehow doing it wrong.