0

I want to generate a large plan whose arguments depend on previously computed targets. Is that possible at all?

Specifically, I need something along the lines of:

drake_plan(
  data = get_data(),
  lots_of_sds = get_sds_from_a_complex_pipeline()
  analysis = target(
    fun(data, mean = mean_val, sd = sd_val),
    transform = cross(mean_val = c(2, 5), sd_val = !!lots_of_sds)
  )
)

The problem is that trying this (or similar variations) fails because lots_of_sds has not been defined and as such the transformation cannot expand the plan.

Has anyone faced a similar situation in the past? Any ideas/workarounds?

Thanks! I'm using drake 7.0.0 and R version 3.5.3

1 Answers1

1

You are almost there. All you need to do is define lots_of_sds beforehand outside drake_plan(), which is standard procedure when you use !!.

library(drake)

lots_of_sds <- c(1, 2)

drake_plan(
  data = get_data(),
  analysis = target(
    fun(data, mean = mean_val, sd = sd_val),
    transform = cross(mean_val = c(2, 5), sd_val = !!lots_of_sds)
  )
)
#> # A tibble: 5 x 2
#>   target       command                    
#>   <chr>        <expr>                     
#> 1 data         get_data()                 
#> 2 analysis_2_1 fun(data, mean = 2, sd = 1)
#> 3 analysis_5_1 fun(data, mean = 5, sd = 1)
#> 4 analysis_2_2 fun(data, mean = 2, sd = 2)
#> 5 analysis_5_2 fun(data, mean = 5, sd = 2)

Created on 2019-05-16 by the reprex package (v0.2.1)

The value of lots_of_sds needs to already exist before you run drake_plan() or make(). This limitation of drake will be difficult to overcome: https://github.com/ropensci/drake/issues/685.

landau
  • 5,636
  • 1
  • 22
  • 50
  • Thanks! Would you say an OK workaround would be to use two different drake plans/makes linked with an oldfashioned Makefile? – Fernando Cagua May 20 '19 at 01:01
  • Possible, but tricky. Why not try a single plan with a predetermined number of groups for your `analysis_*` targets? An applicable strategy was introduced in https://github.com/ropensci/drake/issues/833. You could then use a `combine()` transform to post-process, aggregate, and clean up the results as needed. If the return values of your targets are tidy data structures, this could still end up quite nice. – landau May 20 '19 at 03:00
  • For a two-plan approach, you would need to ensure the targets from the first plan are available to the second one, either with `make(plan1); loadd(); make(plan2)` or strategic `file_out()`'s. Also, I think a `Makefile` is unnecessary, maybe even a little risky since `make(plan1)` and `make(plan2)` both need a chance to run. You would need phony targets. You could do it, but it is risky. Not something I would recommend to most users. – landau May 20 '19 at 03:01
  • Wait, do you know the length of `lots_of_sds` in advance? If you do, you could `map()` over an index set instead of the actual values. – landau May 20 '19 at 04:52