Hi I am new to the drake
R package and would like to hear some opinions on best practice in using subtasks to manage a large project. A simplified structure of my project has two parts: 1) data cleaning and 2) modeling. They are cascaded in the sense that I do data cleaning first, then I rarely go back when I start the modeling part.
I think the approach suggested by the manual is:
source("functions_1.R") # for plan_1
plan1 <- drake_plan(
# many middle steps to create
foo = some_function()
foo_1 = fn_1(foo)
foo_2 = fn_2(foo_1)
for_analysis = data_cleaning_fn()
)
plan2 <- drake_plan(
# I would like to use the target name foo_1 again, but not the same object as they were defined in plan1.
# What I want:
# foo_1 = fn_new_1(for_analysis) # this is different from above defined
# result = model_fn(for_1)
# What I actually did
foo_new_1 = fn_new_1(for_analysis) # I have to define a new name different from foo_1
result = model_fn(foo_new_1)
)
fullplan <- bind_plans(plan1,plan2)
make(fullplan)
One problem I had in the above workflow is that I have a lot of intermediate targets defined for plan1
, but they are useless in plan2
.
- Is there a way that I can have a "clean namespace" in
plan2
so that I can get rid of the useless namesfoo_1
andfoo_2
etc? So that I can reuse these names inplan2
. What I only want to keep inplan_2
isfor_analysis
. - Is there a way that I can use functions defined in
functions_1.R
only forplan1
and functions defined infunctions_2.R
only forplan2
? I would like to work with a smaller set of functions each time.
Thank you a lot!