3

I am generating several reports via r markdown. If I do them one by one - everything is okay. If I use %do% - also okay. If I use %dopar% - 3 options:

  1. Sometimes it's okay.
  2. Sometimes reports have different names but same content.
  3. Sometimes pandoc fails with error: pandoc document conversion failed with error 1

How to fix that?

Here is code that works fine in 100% of cases:

library(tidyverse)
library(parallel)
library(doParallel)



OutputFolder <- "c:\\temp\\test\\out"
result_foldername <- "Now"

ServersInDB <<- c("server1.ru", "server2.ru")

cores=detectCores(logical = FALSE)

cl <- parallel::makeCluster(cores-1) #not to overload your computer

registerDoParallel(cl)

render_all_obj <- function  (MachineName, OutputFolder, result_foldername)
{
  
  library(rmarkdown)
  render(input = "c:\\temp\\test\\proj\\Report.RMD",
         output_file = paste0(MachineName, ".html"),
         output_dir = file.path (OutputFolder, result_foldername  ),
         params = list(MachineName = MachineName)
  )
  
}

foreach (MachineName = ServersInDB) %do% {
  
  render_all_obj(MachineName, OutputFolder, result_foldername)
}

parallel::stopCluster(cl)

Here is code that fails.

library(tidyverse)
library(parallel)
library(doParallel)



OutputFolder <- "c:\\temp\\test\\out"
result_foldername <- "Now"

ServersInDB <<- c("server1.ru", "server2.ru")

cores=detectCores(logical = FALSE)

cl <- parallel::makeCluster(cores[1]-1) #not to overload your computer

registerDoParallel(cl)

render_all_obj <- function  (MachineName, OutputFolder, result_foldername)
{
  
  library(rmarkdown)
  render(input = "c:\\temp\\test\\proj\\Report.RMD",
         output_file = paste0(MachineName, ".html"),
         output_dir = file.path (OutputFolder, result_foldername  ),
         params = list(MachineName = MachineName)
  )
  
}

foreach (MachineName = ServersInDB) %dopar% {
  
  render_all_obj(MachineName, OutputFolder, result_foldername)
}

parallel::stopCluster(cl)

Here is my rmd:


---
output:
  html_document:
    toc: true
    dev: 'svg'
    number_sections: true
    toc_depth: 2
    toc_float: true
    theme: cerulean
    toc_collapsed: true
    self_contained: true
    mathjax: NULL

params: 
  MachineName: "ServerName" #name of server to analyze

---



```{r , echo=FALSE, include=FALSE, results='hide'}

MachineName <- params$MachineName

```



---
title: "My report is about: `r MachineName`"

---

Maxim
  • 301
  • 1
  • 9

1 Answers1

1

The problem was - the file with name Report.knit.md. By default it's created in directory specified with parameter input of rmarkdown::render function. Which is same directory for all parallel processes. All processes are trying to perform create, read, write operations with same file.

Workaround was to use intermediates_dir parameter and unique temp directory for every process.

Working solution:

registerDoFuture()

workers <- parallel::detectCores(logical = FALSE) - 1
future::plan(multisession, workers = workers)


ServersInDB <- c("server1.ru", "server2.ru")

render_all_obj <- function  (MachineName)
{
  
  OutputFolder <- "c:/temp/test/out"
  result_foldername <- "Now"
  
  library(rmarkdown)
  
  tf <- tempfile()
  dir.create(tf)
  
  render(input = "c:/temp/test/proj/Report.RMD",
         output_file = paste0(MachineName, ".html"),
         intermediates_dir=tf,
         output_dir = file.path (OutputFolder, result_foldername),
         params = list(MachineName = MachineName)
  )
  
  unlink(tf)
  
}


ServersInDB %>% furrr::future_map(render_all_obj)
Maxim
  • 301
  • 1
  • 9
  • Changing the `intermediates_dir` didn't work in my case since it somehow affects the interaction between knitr and Pandoc (Pandoc suddenly gets to read the *un*knitted `.Rmd` file for some reason which fails in my particular case due to inline R code). What worked for me was simply creating a temp copy of the input doc with a unique filename, so the `*.knit.md` doesn't clash between workers anymore. – Salim B Feb 20 '23 at 03:43