0

In the question here, the OP mentioned using kill to stop each individual processes, well because I wasn't aware that connections remain open if you you push "stop" while running this in parallel in R Studio on Windows 10, and like a fool I tried to run the same thing 4-5 times, so now I have about 15 open connections on my poor 3 core machines stealing eating up all of my CPU. I can restart my R, but then I have to reclaim all of these unsaved objects which will take a good hour and I'd rather not waste the time. Likewise, the answers in the linked post are great but all of them are about how to prevent the issue in the future not how to actually solve the issue when you have it.

So I'm looking for something like:

# causes problem
lapply(c('doParallel','doSNOW'), library, character.only = TRUE)
n_c <- detectCores()-1
cl<- makeCluster(n_c)
registerDoSNOW(cl)
stop()
stopCluster(cl)  #not reached

# so to close off the connection we use something like
a <- showConnections()
cls$description %>% kill

The issue is very frustrating, any help would be appreciated.

JoeTheShmoe
  • 433
  • 6
  • 13
  • 1
    If you ran this multiple times, then you silently discarded the previous values of `cl`. I believe there is no recovering them. Perhaps your only avenue (without restarting the current R) is to kill the processes manually. Fortunately, `Sys.getpid()` will tell you the current session, so don't kill that one. (Kill processes is os-specific. The `kill` referenced there is an OS command, not an R command. Windows: `taskmgr`, *Details*, sort/search by PID; unixy: `pgrep R` or `pgrep Rterm`, then `kill ...` each number other than the current PID; mac: I think unixy, but you're on your own.) – r2evans Sep 05 '18 at 17:43

1 Answers1

2

Use

autoStopCluster <- function(cl) {
  stopifnot(inherits(cl, "cluster"))
  env <- new.env()
  env$cluster <- cl
  attr(cl, "gcMe") <- env
  reg.finalizer(env, function(e) {
    message("Finalizing cluster ...")
    message(capture.output(print(e$cluster)))
    try(parallel::stopCluster(e$cluster), silent = FALSE)
    message("Finalizing cluster ... done")
  })
  cl
}

and then set up your cluster as:

cl <- autoStopCluster(makeCluster(n_c))

Old cluster objects no longer reachable will then be automatically stopped when garbage collected. You can trigger the garbage collector by calling gc(). For example, if you call:

cl <- autoStopCluster(makeCluster(n_c))
cl <- autoStopCluster(makeCluster(n_c))
cl <- autoStopCluster(makeCluster(n_c))
cl <- autoStopCluster(makeCluster(n_c))
cl <- autoStopCluster(makeCluster(n_c))
gc()

and watch your OSes process monitor, you'll see lots of workers being launched, but eventually when the garbage collector runs only the most recent set of cluster workers remain.

EDIT 2018-09-05: Added debug output messages to show when the registered finalizer runs, which happens when the garbage collector runs. Remove those message() lines and use silent = TRUE if you want it to be completely silent.

HenrikB
  • 6,132
  • 31
  • 34
  • Connections are still open, and it seems that the function opened more. Doesn't seem to work as desired originally. I think I'll have to try @r2evans suggestion. I suppose it does work as you say in that clusters can be removed/updated and rerun, but unfortunately this seems to fall in the "forward thinking solution" rather than the backward-looking solution category. I haven't tried this myself, but perhaps the same thing could be achieved as suggested in the other post by using global variables for the cluster objects. – JoeTheShmoe Sep 05 '18 at 21:40
  • It could be that your workers are running for very long. I don't think `stopCluster()` kills worker processes immediately - it rather sends a "message" to each worker asking them to stop as soon as they receive the message. A worker will only read such messages when it's done processing the code/function/expression you've asked it to run. If you need to kill workers before they finish, yes, then you need to do what @r2evans suggests. – HenrikB Sep 06 '18 at 02:44
  • I've updated `autoStopCluster()` above such that it also outputs debug messages when the garbage collector runs. At least that can help you verify that `stopCluster()` is actually called on those removed cluster objects. I've verified that my example works on Linux and Windows in plain R as well as in the RStudio Console (using R 3.5.1). I've also verified that it works with both `snow::makeCluster()` (that you use because you attach `doSNOW` last) and `parallel::makeCluster()`. – HenrikB Sep 06 '18 at 02:45