1

Why do scripts parallelized with mclapply print on a cluster but not in RStudio? Just asking out of curiosity.

mclapply(1:10, function(x) {
  print("Hello!")
  return(TRUE)
}, mc.cores = 2)
# Hello prints in slurm but not RStudio
Jeff Bezos
  • 1,929
  • 13
  • 23
  • Just a note to dispute your use of tags. It has something to do with rstudio and is not related to the cluster at all really. For instance, it will print on my laptop running kubuntu 18.04 LTS. rstudio is notorious for not playing well with some functions from the parallel package, though I am not postitive of the CS reasons behind it. – lmo Jun 10 '20 at 17:00
  • 1
    Thanks @Imo, I updated the title and tags to show that the question is specific to RStudio – Jeff Bezos Jun 10 '20 at 17:34
  • It prints on my machine using Rstudio. – Phil Jun 10 '20 at 18:14
  • @Phil does it print "hello" or just "TRUE"? – Jeff Bezos Jun 10 '20 at 19:27
  • 1
    https://i.imgur.com/XHFuR5z.png – Phil Jun 10 '20 at 19:45
  • This is a classical example of "it works for me" problem. The behavior of outputting to stdout and stderr depends on the environment which R runs on (here RStudio Console), the underlying operating system, and probably other things. BTW, this output may be relayed in RStudio Terminal although it doesn't do so in RStudio Console. (I'll post a solution not using `parallel::mclapply()` as an answer below). – HenrikB Jun 11 '20 at 00:37

1 Answers1

5

None of the functions in the 'parallel' package guarantee proper displaying of output sent to the standard output (stdout) or the standard error (stderr) on workers. This is true for all types of parallelization approaches, e.g. forked processing (mclapply()), or PSOCK clusters (parLapply()). The reason for this is because it was never designed to relay output in a consistent manner.

A good test is to see if you can capture the output via capture.output(). For example, I get:

bfr <- utils::capture.output({
  y <- lapply(1:3, FUN = print)
})
print(bfr)
## [1] "[1] 1" "[1] 2" "[1] 3"

as expected but when I try:

bfr <- utils::capture.output({
  y <- parallel::mclapply(1:3, FUN = print)
})
print(bfr)
## character(0)

there's no output captured. Interestingly though, if I call it without capturing output in R 4.0.1 on Linux in the terminal, I get:

y <- parallel::mclapply(1:3, FUN = print)
[1] 1
[1] 3
[1] 2

Interesting, eh?

Another suggestion that you might get when using local PSOCK clusters, is to set argument outfile = "" when creating the cluster. Indeed, when you try this on Linux in the terminal, it certainly looks like it works:

cl <- parallel::makeCluster(2L, outfile = "")
## starting worker pid=25259 on localhost:11167 at 17:50:03.974
## starting worker pid=25258 on localhost:11167 at 17:50:03.974

y <- parallel::parLapply(cl, 1:3, fun = print)
## [1] 1
## [1] 2
## [1] 3

But also this gives false hopes. It turns out that the output you're seeing is only because the terminal happens to display it. This might or might not work in the RStudio Console. You might see different behavior on Linux, macOS, and MS Windows. The most important part of the understanding is that your R session does not see this output at all. If we try to capture it, we get:

bfr <- utils::capture.output({
  y <- parallel::parLapply(cl, 1:3, fun = print)
})
## [1] 1
## [1] 2
## [1] 3
print(bfr)
## character(0)

Interesting, eh? But actually not surprising if you understand the inner details on the 'parallel' package.


(Disclaimer: I'm the author) The only parallel framework that I'm aware that properly relays standard output (e.g. cat(), print(), ...) and message conditions (e.g. message()) to the main R session is the future framework. You can read about the details in its 'Text and Message Output' vignette but here's an example showing that it works:

future::plan("multicore", workers = 2) ## forked processing

bfr <- utils::capture.output({
  y <- future.apply::future_lapply(1:3, FUN = print)
})
print(bfr)
[1] "[1] 1" "[1] 2" "[1] 3"

It works the same regardless of underlying parallelization framework, e.g. with local PSOCK workers:

future::plan("multisession", workers = 2) ## PSOCK cluster

bfr <- utils::capture.output({
  y <- future.apply::future_lapply(1:3, FUN = print)
})
print(bfr)
[1] "[1] 1" "[1] 2" "[1] 3"

This works the same on all operating systems and environments where you run R, including the RStudio Console. It also behaves the same regardless of which future map-reduce framework you use, e.g. (here) future.apply, furrr, and foreach with doFuture.

HenrikB
  • 6,132
  • 31
  • 34
  • Thank you so much for your detailed answer @HenrikB, this gives me a lot to think about. I've never heard of the `future` package but it looks like it might useful in projects where I want to track progress. – Jeff Bezos Jun 11 '20 at 16:29
  • 1
    Re "... where I want to track progress": note that future captures and buffers the output and releases "as soon as possible" - the rule of thumb is that the output will be relayed when a parallel worker is done with it's task but not before. If you need near-live progress updates, see the [**progressr**](https://cran.r-project.org/web/packages/progressr/index.html) package, which is designed to work with the future framework. – HenrikB Jun 11 '20 at 16:52
  • @HenrikB could I ask a followup. I'm running into a similar issue but I'm using R to call a third party executable via a `system` call. When run in serial it displays the standard output from the executable, but for some interfaces like Rstudio and the R gui it does not print for parallel jobs. I tried your future packages but it does the same, no output. Is there any way to get that consistently from a parallel call to `system`? – Cole Monnahan Jan 26 '21 at 23:56
  • No, whether or not you'll be able to _view_ `system()` output to standard output (stdout) and standard error (stderr) depends on the environment you run R in. It differs widely and is hard to predict. – HenrikB Jan 27 '21 at 01:35
  • "I tried your future packages but it does the same, no output." - I'm not sure exactly what you tried with the {future} framework, but see above answer - stdout as well as all messages and warning are indeed captured by the future framework and relayed in the main R session when results come back. – HenrikB Jan 27 '21 at 01:37
  • `system2()` allows you to capture stdout and stderr as character strings in R, which you then can output when everything is done. In addition to that, you might also wanna look into the **callr** package. – HenrikB Jan 27 '21 at 01:39
  • @HenrikB I just mean that I tried calling system. Here's an reprex that fails in all parallel packages I tried (foreach/doParallel, future, snowfall, parallel). `test.fn <- function(i) {Sys.sleep(.5); system('where Rterm')}`. When run in serial it always prints to the console. In parallel it only prints when using Rterm, not Rgui nor RStudio. I also tried callr and it doesn't work. The whole idea is to show the progress outputted by this particular program so saving it until the end doesn't really help much. It may be a hopeless but I thought I would check. Thanks! – Cole Monnahan Jan 27 '21 at 19:13