I'm testing out the targets
package and am running into a problem with customizing parallelization. My workflow has two steps, and I'd like to parallelize the first step over 4 workers and the second step over 16 workers.
I want to know if I can solve the problem by calling tar_make_future()
, and then specifying how many workers each step requires in the tar_target
calls. I've got a simple example below, where I'd like the data
step to execute with 1 worker, and the sums
step to execute with 3 workers.
library(targets)
tar_dir({
tar_script({
library(future)
library(future.callr)
library(dplyr)
plan(callr)
list(
# Goal: this step should execute with 1 worker
tar_target(
data,
data.frame(
x = seq_len(6),
id = rep(letters[seq_len(3)], each = 2)
) %>%
group_by(id) %>%
tar_group(),
iteration = "group"
),
# Goal: this step should execute with 3 workers, in parallel
tar_target(
sums,
sum(data$x),
pattern = map(data),
iteration = "vector"
)
)
})
tar_make_future()
})
I know that one option is to configure the parallel backend separately within each step, and then call tar_make()
to execute the workflow serially. I'm curious about whether I can get this kind of result with tar_make_future()
.