0

I create a set of files in my drake plan. I want to copy a subset of these files to another location.

The following code almost achieves that. However, drake's dependency tracking of file changes is lost after taking the subset of file targets that I want to copy.

How can I combine/subset dynamic file targets without losing drake's dependency tracking?

copy_file <- function(file) {
  file_copy <- paste0(file, "_copy")
  file.copy(from = file, to = file_copy, overwrite = TRUE)
  file_copy
}

herb_1_a <- "parsley"
plan <- drake::drake_plan(
  file_1 = target(
    {
      writeLines(herb_1_a, "file_1_a") # Second run
      writeLines("sage", "file_1_b")
      c("file_1_a", "file_1_b")
    },
    format = "file"
  ),

  file_2 = target(
    {
      writeLines("rosemary", "file_2_a")
      writeLines("thyme", "file_2_b")
      c("file_2_a", "file_2_b")
    },
    format = "file"
  ),

  files_to_copy = str_subset(
    c(file_1, file_2),
    "_a$"
  ),

  file_copies = target(
    copy_file(files_to_copy),
    dynamic = map(files_to_copy),
    format = "file"
  )
)

drake::make(plan)
#> ▶ target file_2
#> ▶ target file_1
#> ▶ target files_to_copy
#> ▶ dynamic file_copies
#> > subtarget file_copies_5e57e9ee
#> > subtarget file_copies_ae26ecf9
#> ■ finalize file_copies
readLines("file_1_a")
#> [1] "parsley"
readLines("file_1_a_copy")
#> [1] "parsley"
herb_1_a <- 'banana'
drake::make(plan)
#> ▶ target file_1
#> ▶ target files_to_copy
readLines("file_1_a")
#> [1] "banana"
readLines("file_1_a_copy") # I want this banana
#> [1] "parsley"

Created on 2020-09-24 by the reprex package (v0.3.0)

robust
  • 594
  • 5
  • 17

1 Answers1

1

I think what will solve this is creating a dynamically-mapped set of dynamic input files right before the copying step. In other words, files_to_copy should be a dynamic target of dynamic files. Sketch:

plan <- drake::drake_plan(
  file_1 = target(
    {
      writeLines(herb_1_a, "file_1_a") # Second run
      writeLines("sage", "file_1_b")
      c("file_1_a", "file_1_b")
    },
    format = "file"
  ),
  
  file_2 = target(
    {
      writeLines("rosemary", "file_2_a")
      writeLines("thyme", "file_2_b")
      c("file_2_a", "file_2_b")
    },
    format = "file"
  ),
  
  files_to_copy_group = str_subset(
    c(file_1, file_2),
    "_a$"
  ),
  
  files_to_copy = target(
    files_to_copy_group,
    dynamic = map(files_to_copy_group),
    format = "file"
  ),
  
  file_copies = target(
    copy_file(files_to_copy),
    dynamic = map(files_to_copy),
    format = "file"
  )
)
landau
  • 5,636
  • 1
  • 22
  • 50
  • 1
    Thank you, that works perfectly! And I am surprised that it does. The output of `readd(files_to_copy_group)` does not change at all between the first and the second `make(plan)`. How does drake know that one of the two files changed if the only upstream dependency of `files_to_copy` does not change? Does creating a target with `format = "file"` mean that drake also directly checks the file contents (or hash) for changes in addition to checking for any changes in upstream dependencies? – robust Oct 03 '20 at 01:11
  • 1
    Yes, exactly. `format = "file"` tells `drake` to check the hashes of the file paths returned. – landau Oct 03 '20 at 02:57