Here comes a solution … but: Use it with extreme caution. It messes around with the source code prior to evaluation. Besides, it clutters the output document with diagnostic output. However, for debugging purposes, it might be very useful.
Introduction
As I understand it, the progress window for the knitting process in RStudio cannot be used to log the evaluation progress of a chunk because any chunk output (like message
s) is only show after the whole chunk has been evaluated. Therefore, the only feasible workaround is writing the progress to a file.
The right hook
So the solution will be to write some diagnostic message like "Now evaluating [code]
" to a file, expression by expression. Unfortunately, there is no default knitr hook for "call FUN
before each expression". All the code in a chunk (at least all the code that is to be evaluated) is passed to evaluate::evaluate
in block_exec
(see line 172 in block.R
).
As knitr never iterates over a chunk's code expression by expression, there is no natural way place to implant the logic that writes diagnostic messages. (I didn't delve deeper into evaluate
; maybe such an iteration takes place somewhere therein.) However, the chunk's source code can be manipulated before it is passed to evaluate
at once.
Note that when a chunk is run, its source code is available as value of the chunk option code
. The structure is a character vector, each element representing one line. Manipulating code
allows to modify the source code such that it writes the diagnostic messages on its own while being evaluated.
The logging function
I will use the following small helper function to write the logfile:
writeLog <- function(message, file = paste0(current_input(), ".log"), sep = "\n") {
cat(paste(message, sep), file = file, append = TRUE)
}
The file name of the logfile will be [documentname].log
, e.g. mydocument.Rmd.log
. As writeLog
always appends to an existing file, the following line can be used to purge (erase and recreate) the logfile for each compilation:
if(file.exists(logfile <- paste0(current_input(), ".log"))) { file.remove(logfile) }
A potentially useful extensions of writeLog
could use gc()
to monitor the memory usage.
Modifying the source
The basic idea is to add something like
writeLog(sprintf("%s - Now evaluating the following expression: ", format(Sys.time(), "%Y-%m-%d %H:%M:%S")), sep = NULL)
writeLog('summary(cars)')
before each expression (in the example above: before summary(cars)
). The problem is, where exactly should the call to writeLog
be located? This is not an issue with code containing exactly one (complete) expression each line, but it is problematic in the following case:
summary( # Here some descriptives:
cars
)
For a chunk with that code, code
will be a vector with three elements:
print(code)
[1] "summary( # Here some descriptives:" " cars" ")"
Simply injecting writeLog
calls after each element is not a good idea … Therefore, and this is where the solution becomes somewhat nasty, code
needs to be parse
d first. With the argument keep.source = FALSE
, all comments, formattings etc. will be removed and we get "clean" expressions.
Before each of these expressions, a writeLog
call can be inserted. This new code replaces the original code using an option hook that I called log
:
opts_hooks$set(log = function(options) {
parsed <- parse(text = options$code, keep.source = FALSE)
newCode <- sapply(as.character(parsed), function(x) {
return(c(
'writeLog(sprintf("%s - Now evaluating the following expression: ", format(Sys.time(), "%Y-%m-%d %H:%M:%S")), sep = NULL)',
sprintf("writeLog('%s')", x),
x))
})
options$code = as.vector(newCode)
return(options)
})
Results
Starting with the following chunk
```{r, log = TRUE}
summary( # Here some descriptives
cars
)
Sys.sleep(5)
print("foo")
```
the source code becomes
writeLog(sprintf("%s - Now evaluating the following expression: ", format(Sys.time(), "%Y-%m-%d %H:%M:%S")), sep = NULL)
writeLog('summary(cars)')
summary(cars)
writeLog(sprintf("%s - Now evaluating the following expression: ", format(Sys.time(), "%Y-%m-%d %H:%M:%S")), sep = NULL)
writeLog('Sys.sleep(5)')
Sys.sleep(5)
writeLog(sprintf("%s - Now evaluating the following expression: ", format(Sys.time(), "%Y-%m-%d %H:%M:%S")), sep = NULL)
writeLog('print("foo")')
print("foo")
which produces the following logfile:
2016-03-04 12:08:05 - Now evaluating the following expression: summary(cars)
2016-03-04 12:08:05 - Now evaluating the following expression: Sys.sleep(5)
2016-03-04 12:08:10 - Now evaluating the following expression: print("foo")
Using the logfile
As the purpose of the logfile is to reflect the progress during the knitting process, it is important to open it without blocking further write operations on it. Ideally, we even want to see update of the file.
On Windows, for example Notepad++ opens files in a non-blocking way and has an option to autoreload modified files. On Linux, tail -f [documentname].log
should do the trick.
Limitations and caveats
The main limitation relates to the source code that is visible in the output document if echo = TRUE
: It is polluted by the writeLog
calls. Moreover, all comments as well as identations are lost and the syntax is "standardized" (e.g., print('x')
becomes print("x")
). So far, I've found no good solution for this – only a partial one:
- Add
options$codeBackup <- options$code
as the first line in the log
option hook.
Use the output hook source
and replace the argument x
(the source code that will be printed) by options$codeBackup
:
hook_source <- knit_hooks$get("source")
knit_hooks$set(source = function(x, options) {
return(hook_source(options$codeBackup, options))
})
- Problem: Works only with
results = "keep"
or chunks where the only output occurs at the end. Otherwise, the complete code will be repeated whenever a part of the code should be printed.
Besides, manipulating the source code might be dangerous. As parse
is a base R function, I wouldn't expect the parsing itself to be problematic. However, I am not sure about the conversion to character afterwards.
Of even more concern is the insertion of function calls into the original source. Per se, writeLog
is quite innovent and should not interfere with the following code. But you never know. Imagine a piece of code that generates a list of all files in the current working directory (first expression) and relies on this list to be accurate in the second expression. Suddenly, creating the logfile inbetween is not so innocent anymore … (even though the true problem in this example goes back to an TOCTTOU issue in the original code).
The bottom line is: Don't use this code "in production" but only for debugging purposes.
Full code / Example
---
title: "Log evaluation of chunks line by line"
output: html_document
---
```{r}
library(knitr)
if(file.exists(logfile <- paste0(current_input(), ".log"))) { file.remove(logfile) }
# Caution: This will DELETE "file.log"!
writeLog <- function(message, file = paste0(current_input(), ".log"), sep = "\n") {
cat(paste(message, sep), file = file, append = TRUE)
}
opts_hooks$set(log = function(options) {
#options$codeBackup <- options$code # extension: use codeBackup in source hook to replace parsed code
parsed <- parse(text = options$code, keep.source = FALSE)
newCode <- sapply(as.character(parsed), function(x) {
return(c(
'writeLog(sprintf("%s - Now evaluating the following expression: ", format(Sys.time(), "%Y-%m-%d %H:%M:%S")), sep = NULL)',
sprintf("writeLog('%s')", x),
x))
})
options$code = as.vector(newCode)
return(options)
})
# hook_source <- knit_hooks$get("source")
# knit_hooks$set(source = function(x, options) {
# return(hook_source(options$codeBackup, options))
# })
```
```{r, log = TRUE}
summary( # Here some descriptives
cars
)
Sys.sleep(5)
print("foo")
```