0

I'm using the targets pipelining system in R and am wondering how to statically branch optimally. I have a set of parameters for which I'd like to compute results for most but not all interactions. Notice how N_source_components and N_target_components aren't used by the agg_neighbourhoods target, but they are used by other targets that I didn't include in this example. With the current setup, agg_neighbourhoods will be run too many times (targets doesn't understand that not all columns in the value argument of tar_map are relevant for all targets, right?). Is there a smarter way?

I already tried nesting another tar_map call within the currently shown one, to which N_source_components and N_target_components get relegated. This fixes the redundant executions of agg_neighbourhoods, but doesn't allow me to filter undesirable combinations like I'm doing now because the value of query isn't known at 'compilation' time.

Many thanks :)

tar_map(
  values = tidyr::expand_grid(
    query = c('6369', '6489', '6493'),
    k = c(10, 30, 50),
    d = c(5, 10, 15),
    genelist = c(
      'informativeV15',
      'informativeV15_monotonic',
      'informativeV15_monoreporter'
    ),
    N_source_components = 10L,
    N_target_components = as.integer(c(3, 5))
  ) %>%
  dplyr::filter(
    !(query %in% c('6369') & N_target_components > 3)) %>%
  { . },

  tar_target(agg_neighbourhoods, {
    f(
      so = tar_read(so_target, branch = e2i(query))[[1]],
      genelist = genelist,
      k = k,
      d = d
    )
  }, iteration = 'list')
)
Extrapolator
  • 343
  • 3
  • 7
  • 1
    I think your current solution of filtering `values` is a good way to limit the combinations of arguments that get instantiated as targets. I would also advise against `tar_read()` inside a target, it's better to let `tar_map()` substitute in target names as symbols from `values` (you can define a column of symbols with `rlang::syms()`). Maybe I am not following exactly what you are after, in which case a simpler example might help. – landau Mar 09 '22 at 14:34
  • Thanks landau for the quick answer and for developing this package! What I did was perhaps unorthodox and probably sub-optimal; `so_target` is a dynamic target, with the 'list' iteration type. Can I still access individual items from this target with `rlang::syms()`? I suspect `syms(so_target[[i]])`wouldn't work but haven't tried it yet. – Extrapolator Mar 09 '22 at 19:50
  • As for my original question, perhaps this will help: the `agg_neighbourhoods` targets now get names like `agg_neighbourhoods_6489_50_5_informativeV15_10_3`. The last part (`_10_3`) has absolutely no effect on the result (but does affect other non-shown targets). I don't see a way of relegating the last two parameters to a nested `tar_map` in conjunction with filtering undesired combinations based on both a parameter in the 'parent' `tar_map` (`query`) and one in the child `tar_map` (`N_target components`) though, driving me towards doing it redundantly. – Extrapolator Mar 09 '22 at 19:58
  • 1
    Dynamic branching is not designed for that kind of individual branch access. In a typical pipeline, you are working with dynamic targets as a whole and letting the package worry about individual branch relationships. I'm still not following the example entirely, but I suspect may be an easier way to express the pipeline if you reformulate the problem at a conceptual level. – landau Mar 09 '22 at 20:02
  • That's what I suspected but didn't clearly understand yet at the time of conception. I'll put some effort in to reformulate `so_target` and all its downstream dependencies. – Extrapolator Mar 09 '22 at 20:25
  • I could try defining the values of the `tar_map` call in a separate, global object and then having two or more `tar_map`s on different subsetted versions of that global object. – Extrapolator Mar 09 '22 at 20:27

1 Answers1

0

Hopefully this is helpful to someone: in simpler terms, my problem was that targets were needlessly being run due to my necessity for filtering out some parameter combinations of target instantiations and not all parameters being used by all targets. A more simple and complete example of this scenario would be:

tar_map(
  values = tibble(A = 1:2, B = 1:4) %>%
    dplyr::filter(!(A == 2 & B > 2)),
  
  tar_target(tarX, A*3),

  tar_target(tarY, A*4 + B^2)
)

tarX is being run for each value of B whereas only one evaluation is required. However, since the values of both A and B are informative as to what combinations aren't required, we have to pre-specify the required targets.

Seeing the 'problem' in this much cleaner abstracted representation, a solution becomes obvious more easily: just do two calls to tar_map, each operating on tailor-selected columns of the parameter grid.

param_grid <-
  tibble(A = 1:2, B = 1:3) %>%
  dplyr::filter(!(A == 2 & B > 2))

list(
  tar_map(
    values = param_grid %>%
      dplyr::select(-B) %>%
      dplyr::distinct(),

    tar_target(tarX, A*3)
  ),

  tar_map(
    values = param_grid,

    tar_target(tarY, A*4 + B^2)
  )
)

Perhaps there are other solutions as well. I'd be happy to hear them.

Extrapolator
  • 343
  • 3
  • 7