I'm trying to optimize a highly parallel and memory-intensive targets pipeline. I'm noticing that the wall clock time for downstream dynamic branch targets is much longer than the reported execution time for the same target. Example:
● built branch PSUT_Re_all_Chop_all_Ds_all_Gr_all_f29c72e5 [11.05 seconds]
Wall clock time: 20.07 seconds.
To optimize, I would like to reduce the discrepancy between wall clock time and execution time, if possible. But what could be causing this discrepancy?
Background:
- The input data for each branch target (e.g.,
_f29c72e5
) is created dynamically from rows of a (much) larger upstream data frame target. - I set
storage = "worker"
andretrieval = "worker"
, as suggested for highly parallel pipelines at https://books.ropensci.org/targets/performance.html. - I set
memory = "transient"
andgarbage_collection = TRUE
as suggested for high-memory pipelines at https://books.ropensci.org/targets/performance.html. - The entire upstream (input) data frame takes about 8 seconds to read from disk with
tar_read()
in an interactive session, nearly the full discrepancy between wall clock time and execution time.
Thus, my working theory is that each dynamically created downstream branch is loading the entire upstream target, then slicing, then sending the slices to each branch target's function.
Is that theory plausible? If so, I will create an example project and post another question for how to solve this problem.
Thanks in advance for insights.