0

I am attempting to use a targets workflow in my R project. I am attempting to download water quality data using the dataRetrieval package. In a fresh R session this works:

dataRetrieval::readWQPdata(siteid="USGS-04024315",characteristicName="pH")

To use this in targets, I have the following _targets.R file:

library(targets)

tar_option_set(packages = c("dataRetrieval"))

list(
  tar_target(
    name = wqp_data,
    command = readWQPdata(siteid="USGS-04024315",characteristicName="pH"),
    format = "feather",
    cue = tar_cue(mode = "never")
  )
)

when I run tar_make() the following is returned:

* start target wqp_data
No internet connection.
The following url returned no data:

https://www.waterqualitydata.us/data/Result/search?siteid=USGS-04024315&characteristicName=pH&zip=yes&mimeType=tsv
x error target wqp_data
* end pipeline
Error : attempt to set an attribute on NULL
Error: callr subprocess failed: attempt to set an attribute on NULL
Visit https://books.ropensci.org/targets/debugging.html for debugging advice.
Run `rlang::last_error()` to see where the error occurred.

I have attempted debugging using tar_option_set(debug = "wqp_data") or tar_option_set(workspace_on_error = TRUE) but outside of isolating the error to readWQPdata() didn't get anywhere.

I also had success using curl directly in targets so I do not think it is my actual internet connection:

list(
  tar_target(
    name = wqp_data,
    command = {con <- curl::curl("https://httpbin.org/get")
    readLines(con)
    close(con)}
  )
)

tar_make()
* start target wqp_data
* built target wqp_data
* end pipeline

Any advice on how to diagnose the connection issue when using these two packages?

mpschramm
  • 520
  • 6
  • 12
  • 1
    Thanks for the reproducible example. Unfortunately for you though, the pipeline completed without errors when I ran it with `tar_make()`. But for what it's worth, the internet connection is probably happening at https://github.com/USGS-R/dataRetrieval/blob/b60c41a2b5a36e60a5d08bfeabc974dfc4bf83a0/R/getWebServiceData.R#L25-L28. Maybe run `debug(dataRetrieval::: getWebServiceData); tar_make(callr_function = NULL)` and look for strangeness in your execution environment? – landau Oct 05 '21 at 20:05
  • @landau thank you!!! That got me on the right track, although I still don't understand the underlying reason. I don't think it is targets related though. `curl::has_internet` returns FALSE in my targets project or a new clean R session. The dataRetrieval functions complete the API calls in the clean R session for some reason. Regardless, https://stackoverflow.com/questions/59796178/r-curlhas-internet-false-even-though-there-are-internet-connection offers a solution. – mpschramm Oct 05 '21 at 20:58

0 Answers0