1

I want to utilize the R package targets to call shell commands and read results, e.g. the exit code into a new target. The commands are organized as a tibble with metadata attached to them. Currently, I have a workflow which runs the command of each row.

However, if I delete one of the output file, the corresponding sub-target won't be recreated but just skipped. Note that this is required, because the output file content is intentionally not stored in the target R object. In order to solve this issue, one needs to insert a new target with tar_target(format = "file") for each row of the commands table. I can not simply let the function call_shell return the filename as a character vector, because I need to join metadata downstream, e.g. doing calls %>% left_join(commands).

I've read the targetopia contributing page. Unfortunately, I am unable to expand the factory example with dynamic branching.

This is my _targets.R file:

library(tidyverse)
library(targets)

#' Execute a shell command and save the output to a file
call_shell <- function(command = "echo hi", file = "out.txt") {
  exit_code <-
    command %>%
    paste0(" > ", file) %>%
    system()

  list(
    exitcode = exit_code,
    size = file.info(file)$size
  )
}

list(
  tar_target(
    commands,
    {
      tribble(
        ~id, ~command, ~long_command,
        1, "echo foo", FALSE,
        2, "echo foo bar baz", TRUE
      )
    }
  ),
  tar_target(
    calls,
    command = {
      commands %>%
        mutate(call = command %>% map(~ call_shell(
          command = .x, file = paste0(tar_name(), ".txt")
        )))
    },
    pattern = map(commands)
  )
)

How can I create a function tar_target_shell so that this will create two targets: one for the exit codes and one to track potentially missing output files?

danlooo
  • 10,067
  • 2
  • 8
  • 22
  • 1
    It seems easier to represent shell commands as strings embedded in R code instead of external files. If you do that, I do not think you will need a target factory for this. – landau Mar 17 '22 at 18:54
  • Thanks for responding! The `echo` is a stub for this repex. The output file e.g. `calls_7ff9dcaf.txt` contains now just `foo` but can be many GB in the future and is of any file format, thus should not be serialized into rds. `commands` might be created by other targets as well. `tar_make()` only rebuilds this txt file iff target `calls` is a character vector of filenames. But then I loose metadata and can not do e.g. `calls %>% left_join(commands)` anymore. I think I need a factory here... – danlooo Mar 17 '22 at 20:00
  • 1
    Seems like the value and the exit code would have to come from the same target (in this case, branch). You could write both to files and have the file target return a character vector with both paths. – landau Mar 17 '22 at 21:47

0 Answers0