0

I have a large and complicated workflow (lots of initial inputs, recoding, merges, dropped observations, etc) in R and I do that work within many isolated functions specific to each input type, each merge and data manipulation step, etc. Right now only the final "analysis dataset" is returned into the global environment.

However, I want to write a knitr document that documents the data assembly process, but all of the various objects (data frames/tibbles) are local to the functions in which they are assembled, which I take as good practice.

The options seem to be:

  • I could generate lots of interim data objects to the global environment, but that would clutter the global environment, which I would like to keep neat

  • I could return lists of interesting attributes (N, merge success info, structures, etc) from the function to the global environment. A little neater, but not completely efficient.

This is clearly now a new problem. I would welcome suggestions on the best way(s) forward?

user2292410
  • 447
  • 4
  • 13

2 Answers2

1

Have you considered using knitr::spin? There are three types of comments that are used to define how the end file will be rendered.

  1. # a standard R comment
  2. #' at the beginning of the line will be rendered as markdown
  3. #+ chunk options

By writing your data-assembly.R script and then calling knitr::spin("data-assembly.R") a .html file will be generated that may provide the needed detail.

Example data-assembly.R file:

#' # Data Assembly Process
#' This document provides details on the construction of the final analysis data
#' set.
#' 
#' The namespaces needed for this work are:
#+ message = FALSE
library(tidyverse)

#' Our first step is to read in the data sets.  For this example, we'll just use
#' the `mtcars` data set
mtcars

#' A summary of the `mtcars` data set is below
summary(mtcars)

#' Let's only use data records for cars with automatic transmissions
mt_am_cars <- dplyr::filter(mtcars, am == 1)
mt_am_cars
Peter
  • 7,460
  • 2
  • 47
  • 68
  • Hmm. I broke the input up into many small functions (as one "should"), and am reticent to take all the function out. I think I will have to return some objects to the calling environment, or return at least a list which is poulated with information about the input files (number of records,date-time stamps, etc. ) Once those objects are returned as a list in to the calling environment, I can build them into my reports. I think this is the solution. I'll post a working example in a day or so, just to wrap up this thread. – user2292410 Jun 01 '17 at 21:41
0

Return objects with a class attribute, and define a print method for those classes. In the main document, print the objects. That's the standard R approach to this problem.

user2554330
  • 37,248
  • 4
  • 43
  • 90