What is the idiomatic, flexible way to specify mutliprocess plan in future?

Question

I just encountered the R packages furrr and future. I would love to use them to write flexible code that will use multiple cores if available on machines running Windows or OSX. I would love for the default # of "available" cores to be something like parallel::detectCores()-1, rather than detectCores(). Seems like plan(multiprocess) is the streamlined, idiomatic way to almost do this, but it defaults to using all the cores on the machine. I've encountered more explicit ways to specify the "plan". What is the idiomatic way to use mostly the default behavior of plan(multiprocess), but also limit the number of cores to 1 fewer than whatever detectCores() would return?

EDIT Based on @HernikB 's comment above, I believe that a reasonable answer to this question would be along the lines of options(future.plan="multiprocess",mc.cores = parallel::detectCores() - 1L).

EDIT I am finding that this method frequently doesn't use the multiprocess plan (or at least does not go parallel with future_map_dfr) when it seems it could. By contrast

nc<-detectCores()-1 plan(strategy=multiprocess, workers=nc)

seems to get it going in parallel. Leaving the question as unanswered.

score 0 · Accepted Answer · answered Jan 23 '19 at 23:24

0

Take a look at ?future::multiprocess. You can do:

plan(multiprocess(workers = 3))

If you have 4 cores on your compute.r.

answered Jan 23 '19 at 23:24

thc

9,527
1
24
39

Thanks for that. Does this mean everything that happens in that plan should be used as the argument to "expr" in `multiprocess`? – Michael Roswell Jan 23 '19 at 23:31
Also, this approach should work when I know how many cores are on my computer, but I'm wondering if there is clever functionality (in the same way that `multiprocess` can set up the cluster according to the OS) to set the number of workers still based on what's available, but not as **all** that's available. Thinking about writing code that a collaborator could just pull from git and run. – Michael Roswell Jan 23 '19 at 23:36
3

Author of future here: Not that `availableCores()` defaults to `parallel::detectCores()` but will also acknowledge lots of alternative settings that may be in place on the machine where the code runs. Specifically, if you're on a multi-tenant/multi-user machine, `availableCores()` will make sure your code plays nice on that machine - you should _not_ use `detectCores()` in that cases. If you want to override the default and still be a good citizen, I recommend settings, say, `options(mc.cores = parallel::detectCores() - 1L)` - that will use _at most_ that number of cores. – HenrikB Jan 24 '19 at 00:25
@HenrikB forgive me if this doesn't make sense, but it just seemed to me that `plan(multiprocess, workers=7)` returned the following error: `Error in fun(expr = expr, envir = envir, substitute = FALSE, lazy = lazy, : argument "expr" is missing, with no default` while `plan(strategy=multiprocess, workers=7)` worked. It was the error on first attempt that led me to this question. – Michael Roswell Jan 24 '19 at 02:02
1

@Michael, that's odd - first time I've heard of this. I can only assume that you have some other package attached that masks one or more of the future functions. You can test for this with `identical(plan, future::plan)` and `identical(multisession, future::multisession)` - they return TRUE if there's no conflict. Also, retry in a fresh R session. – HenrikB Jan 24 '19 at 02:19

What is the idiomatic, flexible way to specify mutliprocess plan in future?

1 Answers1