Best practices for drake pipeline experimentation

Question

I'm new to drake but loving it so far. One thing I'm having trouble with is how to best go about experimenting with different pipeline configurations. That is, my plans consist purely of a chain of targets where the output from the first target is the input for the second, the second forms the input for the third, etc. My targets all have the same basic structure (dynamic targets with tibbles as individual entries) expected as input and supplied as output, and I want to experiment with different orderings, inclusion/exclusion of certain steps, etc. For example:

plan = drake::drake_plan(

    a_transformed = target(
        compute_a_transform(list_of_input_data)
        , dynamic = map(list_of_input_data)
    )

    , b_transformed = target(
        compute_b_transform(a_transformed)
        , dynamic = map(a_transformed)
    )

    , c_transformed = target(
        compute_c_transform(c_transformed)
        , dynamic = map(c_transformed)
    )

)

The way I've been using drake so far is that each target has a unique/meaningful name, so when I, for example, remove a target, I have to rename the input supplied to the subsequent target:

plan = drake::drake_plan(

    a_transformed = target(
        compute_a_transform(list_of_input_data)
        , dynamic = map(list_of_input_data)
    )

    #, b_transformed = target(
    #    compute_b_transform(a_transformed)
    #    , dynamic = map(a_transformed)
    #)

    #note the b-transform step has been removed (commented-out), requiring inputs to c_transform to be changed from `b_transform` to `a_transform`

    , c_transformed = target(
        compute_c_transform(a_transformed) #had to rename things here
        , dynamic = map(a_transformed) #and here
    )

)

Would it be too much to hope that there's a better way of experimenting that doesn't require this manual commenting-out and renaming?

I am not sure I understand. What are examples of plans you are iterating on, and what exactly is inconvenient about the process? What you describe sounds to me like [ordinary refactoring](https://en.wikipedia.org/wiki/Code_refactoring). The only difference with `drake` is that users typically want to avoid repeating long-running tasks. If you are just trying to learn and experiment, maybe you can start with fast code so `make()` does not take long. Or, invoke `make()` with `recover = TRUE` when you revert back to old targets. — landau, Jan 27 '20 at 02:10
Apologies, I should have included an example. I've done so now. Thanks also for your suggestions on using `make()` efficiently, but as is hopefully made more clear by the example, my question pertains more to efficiently changing the code for a `plan()`. — Mike Lawrence, Jan 28 '20 at 14:12
Ah, okay, thanks for spelling it out. This helps me understand. — landau, Jan 28 '20 at 15:03
Is it the renaming itself that bothers you or the runtime of successive `make()`s? In the latter case, there are shortcuts you can take to temporarily speed things up. In the former case, writing and editing code is an unavoidable part of using `drake`, so I do not think anything can be done. But the `drakeplanner` Shiny app may help you quickly do the iteration to make sure your targets are connected properly: https://github.com/wlandau/drakeplanner, https://wlandau.shinyapps.io/drakeplanner. — landau, Jan 28 '20 at 15:07
The renaming itself is what I'm looking for a more elegant solution to. Seeing `drakeplanner` leaves me wondering if some sort of GUI-based solution might be achievable where the user connects targets to define their inputs/outputs. So if I wanted to remove b from the chain, I'd simply connect a_transformed right to c_transformed. But if this doesn't already exist, I can see it being a big lift to implement. — Mike Lawrence, Jan 28 '20 at 18:55
I won't protest if others want to develop such a GUI, but drakeplanner is the closest I will go. Writing code is an unavoidable requirement of using drake, so I think a GUI would need to let people write code too. I have had bad experiences with such hybrids. — landau, Jan 28 '20 at 21:54

score 0 · Answer 1 · answered Feb 14 '20 at 14:30

I worked out a method that is a bit of a hack but works for me. I simply add a skip argument to each function that triggers return of the input if TRUE:

compute_a_transform = function(x,skip=F){
    if(skip){
        return(x)
    }
    ... #regular compute_a_transform stuff here
}

Then, when I want to skip a step in the processing chain, I simply set skip=TRUE without commenting-out or renaming anything

plan = drake::drake_plan(

    a_transformed = target(
        compute_a_transform(list_of_input_data)
        , dynamic = map(list_of_input_data)
    )

    , b_transformed = target(
        compute_b_transform(a_transformed, skip=TRUE) #skip=TRUE means the b-transform isn't actually applied
        , dynamic = map(a_transformed)
    )

    , c_transformed = target(
        compute_c_transform(c_transformed)
        , dynamic = map(c_transformed)
    )

)

Best practices for drake pipeline experimentation

1 Answers1