Finally, I've decided to move my dissertation research closer toward the goal of making it as good reproducible research as it can be, given my circumstances. Since currently I don't use LaTeX
for my dissertation report (though I'm considering this option), I believe that knitr
is the best way to go.
The software project, implementing empirical part of my dissertation research (data analysis), is being written in R
. The project's contains multiple files within directory structure, which is rather typical for scientific workflows (top level sub-directories: analysis, cache, data, figures, import, prepare, present, results, sandbox, utils
).
I have read a lot of information (including examples) on using knitr
for auto-generating reports and reproducible research, in general. However, I'm somewhat overwhelmed by multitude of configuration options and, more importantly, still confused on the best/correct/optimal approach for using knitr
in projects like mine, containing multiple files and directories. In particular, I'm interested in advice on framework and steps for transitioning existing codebase without too many modifications in R
modules.
As an example, let's consider my modules, related to exploratory data analysis (EDA). My current EDA workflow includes:
preliminary data, transformed from the original raw data (located in "data/transform" sub-directories);
module "eda.R", located in "analysis" directory;
directory "results/eda", where my current code is generating figures (SVG files) of univariate and multivariate EDA, as well as a single document report (PDF file) with the same graphical only information (generated descriptive statistics is being produced as a console output, when running the "eda.R" script).
In order to transition to knitr
-based project, I have created file "eda-report.Rmd" with R Markdown
statements for setting local knitr
options, including read_chunk("eda.R")
. My understanding is that now I need to define existing blocks of R
code in "eda.R" as knitr
chunks and then call these named chunks, according to my EDA workflow.
Questions:
Is it correct approach? What are best practices for using knitr
in regard to setting up project paths, using source()
, grouping some plots via gridExtra
, preventing potential issues? It seems to me that, in addition to "eda-report.Rmd", I need to create another R module, which will be initiating processing of the .Rmd
file by knitr
. If Yes, which call should I use: rmarkdown::render()
or knitr::knit()
(while I use RStudio
for development, I want my code to be independent from the development environment)?
UPDATE 1 (Additional question):
Why processing of an .Rmd
file in RStudio
via "Knit HTML" button produces HTML
document, while processing via Makefile
command Rscript -e 'library("knitr"); knit("eda-report.Rmd")'
produces .md
file, but not HTML
, despite the presence of output: html_document
directive?
Thank you for reading this! Your advice will be greatly appreciated!