3

I am having problems understanding the (practical) difference between the different ways to externalise code in R notebooks. Having referred to previous questions or to the documentation, it is still unclear the difference in sourcing external .R files or read_chunk() them. For practical purposes let us consider the below:

  1. I want to load libraries with an external config.R file: the most intuitive way, according to me, seems to create config.R as

    library(first_package)
    library(second_package)
    ...
    

    and, in the general R notebook (say, main.Rmd) call it like

    ```{r}
    source('config.R')
    ```
    
    ```{r}
    # use the libraries included above
    ```
    

    However, this does not recognise the packages included, so it seems that sourcing an external config file is useless. Likewise using read_chunk() instead. Therefore the question is: How to include libraries at the top, so that they are recognised in the main markdown script?

  2. Say I want to define global functions externally, and then include them in the main notebook: along the same lines as above one would include them in an external foo.R file and include them in the main one.

Again, it seems that read_chunk() does not do the job, whereas source('foo.R') does, in this case; the documentation states that the former "only evaluates code, but does not execute it": when is it ever the case that one wants to only evaluate the code but not execute it? Differently posed: why would one ever use read_chunk() rather than source, for practical purposes?

code-glider
  • 239
  • 2
  • 3
gented
  • 1,620
  • 1
  • 16
  • 20

1 Answers1

4
  1. This does not recognise the packages included

    In your example, first_package and second_package are both available in the working environment for the second code chunk.

    Try putting library(nycflights13) in the R file and head(airlines) in the second chunk of the Rmd file. Calling knit("main.Rmd") would fail if the nycflights13 package wasn't successfully loaded with source.

  2. read_chunk does in fact accomplish this (along with source) however they go about it differently. With source you will have the global functions available directly after the source (as you have found). With read_chunk however, as you pointed out since it only evaluates code, but does not execute it you need to explicitly execute the chunk and then the function will be available. (See my example with third_config_chunk below. Including the empty chunk of third_config_chunk in the report allows the global some_function to be called in subsequent chunks.)

Regarding "only evaluates code, but does not execute it", this is an entire property of R programming known as lazy evaluation. The idea being that you may want to create a number of functions or template code which is read into your R environment but is not executed on-the-spot, allowing you to modify the environment/parameters prior to evaluation. This also allows you to execute the same code chunks multiple times whereas source will only run once with what is already provided.

Consider an example where you have an external R script which contains a large amount of setup code that isn't needed in your report. It is possible to format this file into many "chunks" which will be loaded into the working environment with read_chunk but won't be evaluated until explicitly told.

In order to externalise your config.R using read_chunk() you would write the R script as:

config.R

# ---- config_preamble
## setup code that is required for config.R
## to run but not for main.Rmd

# ---- first_config_chunk
library(nycflights13)
library(MASS)

# ---- second_config_chunk
y <- 1

# ---- third_config_chunk
some_function <- function(x) {
  x + y
}

# ---- fourth_config_chunk
some_function(10)

# ---- config_output
## code that is output during `source`
## and not wanted in main.Rmd
print(some_function(10))

To use this script with the externalisation methodology, you would setup main.Rmd as follows:

main.Rmd

```{r, include=FALSE}
knitr::read_chunk('config.R')
```

```{r first_config_chunk}
```

The packages are now loaded.

```{r third_config_chunk}
```

`some_function` is now available.

```{r new_chunk}
y <- 20
```

```{r fourth_config_chunk}
```
## [1] 30

```{r new_chunk_two}
y <- 100
lapply(seq(3), some_function)
```
## [[1]]
## [1] 101
## 
## [[2]]
## [1] 102
## 
## [[3]]
## [1] 103

```{r source_file_instead}
source("config.R")
```
## [1] 11

As you can see, if you were to source this file, there would be no way to modify the call to some_function prior to execution and the call would print an output of "11". Now that the chunks are available in the environment, they can be re-called any number of times (after for example, changing the value of y) or used any other way in the current environment (eg. new_chunk_two) which would not be possible with source if you didn't want the rest of the R script to execute.

ruaridhw
  • 2,305
  • 8
  • 22
  • Thank you for the thorough answer, it is more clear now. It seems, however, that my error was to call the function from within the same chunk that I used to actually "load" the chunk: apparently one needs one chunk to call the external code and only then the code can be used, but in other new chunks instead (which seems very counter-intuitive to me). To be explicit, if I did ```{r read_first_chunk} #use loaded libraries ``` then your example wouldn't work either. – gented Jan 20 '18 at 15:09
  • Yes, the execution chunk needs to be empty otherwise anything written in its place will be executed instead. Alternatively, you can execute the chunks into the environment immediately after `read_chunk` with `eval(parse(text=knitr:::knit_code$get("third_config_chunk")))` or whichever chunk name you like. Note however this function isn't exported by `knitr`. – ruaridhw Jan 20 '18 at 15:24
  • I see: however, is there any difference between reading external .R files rather than .Rmd? Technically, the chunks themselves can be in the latter format (or is there any reason to prefer the former)? – gented Jan 20 '18 at 16:42
  • Functionally, I don't believe so. It would just come down to personal preference and use-case. I use external R files if I want the chunks to comprise a larger script which can be distributed as a stand-alone source file to others unfamiliar with R markdown or called via another workflow (ie. `Rscript`). Rmd are obviously not pure R source so this wouldn't work. – ruaridhw Jan 20 '18 at 17:59