0

I am running into a bizarre error where tar_make() and/or tar_make_future() will hang indefinitely. This happens with both plan(sequential) and plan(future.callr) so it is not the same as this issue.

This is a little tricky because I am using non-public packages. I will attempt to make a reprex later. The issue seems to be with propagating an error from a deep function call. Calling stop() in the top-level function analysis_workflow() or the next call analysis_factory() works fine and causes the target to stop with an error, but when the error is deeper (raised by a function in a package) it seems to lock up tar_make().

The error propagates fine when I call it interactively. Bizarrely, using tar_option_set(debug = <dynamic branch>) does not seem to work; the target hangs before entering debug mode.

Any ideas on what could be causing tar_make() to stall, from the limited info I can provide right now?

# _targts.R
# all workflow functions are in package:wp.batch
# package wp.batch imports package wp.analysis


library(targets)
library(wp.batch)
library(future)
targets::tar_option_set(packages = "wp.batch", error = "null", 
    workspace_on_error = TRUE)
plan("sequential")
alts = c("first", "second")
list(tar_target(config_file, "test_debug.xlsx", format = "file"), 
    tar_target(config, read_batch_config(config_file, alts)), 
    tar_target(data, data_workflow(config), pattern = map(config)), 
    tar_target(results, analysis_workflow(data), pattern = map(data)), 
    tar_target(plots, plot_workflow(results), pattern = map(results), 
        deployment = "main"))
# backtrace when running `analysis_workflow(data)` interactively
Backtrace:
     ▆
  1. ├─wp.batch::analysis_workflow(data)
  2. │ ├─dplyr::mutate(...)
  3. │ └─dplyr:::mutate.data.frame(...)
  4. │   └─dplyr:::mutate_cols(.data, dplyr_quosures(...), by)
  5. │     ├─base::withCallingHandlers(...)
  6. │     └─dplyr:::mutate_col(dots[[i]], data, mask, new_columns)
  7. │       └─mask$eval_all_mutate(quo)
  8. │         └─dplyr (local) eval()
  9. └─purrr::map2(.data$data, .data$analysisInputs, analysis_factory)
 10.   └─purrr:::map2_("list", .x, .y, .f, ..., .progress = .progress)
 11.     ├─purrr:::with_indexed_errors(...)
 12.     │ └─base::withCallingHandlers(...)
 13.     ├─purrr:::call_with_cleanup(...)
 14.     └─wp.batch (local) .f(.x[[i]], .y[[i]], ...)
 15.       ├─base::do.call(summary_analysis, c(data, unlist(inputs, recursive = FALSE)))    
 16.       └─wp.analysis (local) `<fn>`(df = `<tibble[,8]>`, metrics = NA_character_, type = "metric by duration")
 17.         └─wp.analysis:::analysis_duration(...)
 18.           └─base::stop("analysis_duration() not implemented")
mikeck
  • 3,534
  • 1
  • 26
  • 39
  • I suggest peeling back more layers and seeing how small you can make your test case. That will help you isolate the problem. Can you reproduce the error without dynamic branching? With a single target? If a single target, can you reproduce it in a `callr::r()` process instead of `tar_make()`? (c.f. https://books.ropensci.org/targets/debugging.html#system-issues) – landau Apr 12 '23 at 17:56

1 Answers1

0

I was able to work around the issue by adding a tryCatch statment in my top-level workflow functions:

workflow_function = function(...) {
  tryCatch(
    # unchanged workflow code...,
    error = stop
  )
}

Which throws the error as normal, and is further evidence that the underlying problem has to do with error propagation.

mikeck
  • 3,534
  • 1
  • 26
  • 39