0

I have a plan like:

plan = drake::drake_plan(

    targ1 = target(
        f1(input)
        , dynamic = map(input)
    )

    , targ2 = target(
        f2(targ1)
        , dynamic = map(targ1)
    )
)

Where the function f1 actually yields multiple chunks of output (say, in a list), and I'd like these multiple chunks to be processed separately when targ2 is computed. Is this possible?

Here's a minimal example:

f1 = function(x){
    return(list(x,x+1))
}

f2 = function(x){
    return(x*2)
}

input = c(1,99)

plan = drake::drake_plan(
    targ1 = target(
        f1(input)
        , dynamic = map(input)
    )
    , targ2 = target(
        f2(targ1)
        , dynamic = map(targ1)
    )
)
drake::make(
    plan
)

Where as coded, drake gets an error in processing targ2 because the list in each subtarget from targ1 hasn't been broken apart yet. Obviously I could rewrite f2 to iterate over the list, but this was for demonstration purposes and in my actual use case there are good reasons for wanting to simply split out the results from targ1.

I thought I had it solved with:

f1 = function(x){
    return(list(x,x+1))
}

f2 = function(x){
    return(x*2)
}

input = c(1,99)

plan = drake::drake_plan(
    targ1 = target(
        f1(input)
        , dynamic = map(input)
    )
    , targ2 = target(
        unlist(targ1)
    )
    , targ3 = target(
        f2(targ2)
        , dynamic = map(targ2)
    )
)

But in my real use case each subtarget takes up a lot of memory, and the computation of targ2 appears to necessitate bringing them all into memory, causing a lock up as my machine runs out of memory.

I've worked out a hack where I save the individual list elements from each subtarget in targ1 to file then do a list_files() search for all such files as input to later targets, but maybe there's a simpler?

Here's the hack that's "working" but surely less than ideal:

library(drake)

f1 = function(x){
    out = list(x,x+1)
    for(i in 1:length(out)){
        a = out[[i]]
        save(a,file=paste0(digest::digest(a),'.rda'))
    }
    return(digest::digest(out))
}

f2 = function(x){
    list.files(pattern='.rda')
}

f3 = function(this_rda){
    load(this_rda)
    return(a)
}

f4 = function(x){
    return(x*2)
}

input = c(1,99)

plan = drake::drake_plan(
    targ1 = target(
        f1(input)
        , dynamic = map(input)
    )
    , targ2 = target(
        f2(targ1)
    )
    , targ3 = target(
        f3(targ2)
        , dynamic = map(targ2)
    )
    , targ4 = target(
        f4(targ3)
        , dynamic = map(targ3)
    )
)
drake::make(plan)
readd(targ4)

Mike Lawrence
  • 1,641
  • 5
  • 20
  • 40

1 Answers1

1

drake does not support dynamic branching within dynamic sub-targets, but you can combine static branching with dynamic branching to achieve something very similar.

library(drake)
input_values <- c(1, 99)
plan <- drake_plan(
  targ1 = target(
    f1(input),
    transform = map(input = !!input_values)
  ),
  targ2 = target(
    f2(targ1),
    transform = map(targ1),
    dynamic = map(targ1)
  )
)

drake_plan_source(plan)
#> drake_plan(
#>   targ1_1 = f1(1),
#>   targ1_99 = f1(99),
#>   targ2_targ1_1 = target(
#>     command = f2(targ1_1),
#>     dynamic = map(targ1_1)
#>   ),
#>   targ2_targ1_99 = target(
#>     command = f2(targ1_99),
#>     dynamic = map(targ1_99)
#>   )
#> )

Created on 2020-05-28 by the reprex package (v0.3.0)

landau
  • 5,636
  • 1
  • 22
  • 50
  • Thanks! Though your solution doesn't seem to make: ``` r #> ▶ target targ1_99 #> ▶ target targ1_1 #> ▶ dynamic targ2_targ1_99 #> > subtarget targ2_targ1_99_a1ac23fc #> x fail targ2_targ1_99_a1ac23fc #> Error: target targ2_targ1_99_a1ac23fc failed. #> diagnose(targ2_targ1_99_a1ac23fc)error$message: #> non-numeric argument to binary operator #> diagnose(targ2_targ1_99_a1ac23fc)error$calls: #> 1. └─global::f2(targ1_99) ``` (I'm running drake 7.12.1.9000 [drake@5798e3b]) – Mike Lawrence May 28 '20 at 15:02
  • 1
    It was just a rough sketch of the the general idea. You may have to adjust the functions or commands. Dynamic branching treats lists as vectors, so you may have to call f2(targ1_1[[1]]) instead of f2(targ1_1). – landau May 28 '20 at 16:07
  • Ah, shoot. I got this working by doing targ1[[1]], but when translated into my more voluminous input data scenario I’m back to running out of RAM as in the attempt I described in my initial post. (Note the hack at the end of the post works fine with no RAM issues). I actually don’t understand why your solution is causing a ram issue where writing and reading from files isn’t. Could it be as simple as the file system bottlenecking the latter so it doesn’t even have the chance to run out of ram before earlier workers are finished and release their memory? – Mike Lawrence Jun 02 '20 at 18:30
  • It's hard to say from what I know. Have you tried garbage collection and memory strategies from https://books.ropensci.org/drake/memory.html? – landau Jun 02 '20 at 21:46
  • Yeah, I have `memory_strategy = 'autoclean'` and `garbage_collection = TRUE`, plus also do a `gc()` call before returning in all my functions. – Mike Lawrence Jun 03 '20 at 19:28