The problem
Hello everyone, I am new to the targets package, and I am trying to make a workflow. For this workflow I am using extensively a package I am developing called SDMWorkflows that you can install from github using remotes:
remotes::install_github("Sustainscapes/SDMWorkflows")
I put the code and dataset here in this repo
What I am trying to do is download and clean and then model several species, for that I am using so far this functions that I added to the R folder:
Read_Data <- function(Data){
readr::read_csv("Species.csv") |>
dplyr::pull(Species)
}
DownloadSpecies <- function(Cleaned){
SDMWorkflows::GetOccs(Species = Cleaned,
continent = "europe",
limit = 1000,
WriteFile = F)
}
Clean_Coord <- function(Coords){
SDMWorkflows::clean_presences(Coords[[1]])
}
And using the following pipeline
tar_option_set(
packages = c("SDMWorkflows", "readxl", "janitor", "dplyr", "purrr"), format = "rds" # default storage format
)
tar_source()
list(
tar_target(file, "Species.csv", format = "file"),
tar_target(data, Read_Data(file)),
tar_target(
Species_Presences,
DownloadSpecies(data),
pattern = map(data),
iteration = "list"
),
tar_target(Cleaned_Coordinates,
Clean_Coord(Species_Presences),
pattern = map(Species_Presences),
iteration = "group",
error = "null")
)
The problem I have is that some of the Species_Presences
that come out of the DownloadSpecies function are empty data frames, which I expect might happen to some of my 30.000 species I will test later. So I am trying to figure out how then I can keep going after this, because apparently setting error to "null" will not allow me to keep going in the workflow
[![enter image description here][1]][1]
What I have tried
First try
I tried adding a function called combine_presences
in between, where I thought I could use from purrr::reduce(bind_rows) and then split again by species thus getting a new mapped object with only the species that still have presences before doing the coodrinate cleaning, the function looks like this:
combine_presences <- function(Species) {
Species[[1]][[1]] |>
dplyr::select("key", "scientificName", "decimalLatitude", "decimalLongitude", "kingdom", "phylum", "order", "family", "genus", "species") |>
dplyr::group_split(species)
}
and modifying the targets code as follows:
list(
tar_target(file, "Species.csv", format = "file"),
tar_target(data, Read_Data(file)),
tar_target(
Species_Presences,
DownloadSpecies(data),
pattern = map(data),
iteration = "list"
),
tar_target(Join_and_Split,
combine_presences(Species_Presences)),
tar_target(Cleaned_Coordinates,
Clean_Coord(Join_and_Split),
pattern = map(Join_and_Split),
iteration = "group",
error = "null")
)
However there I only get one species of the whole set
Second try
I tried to map it like this
list(
tar_target(file, "Species.csv", format = "file"),
tar_target(data, Read_Data(file)),
tar_target(
Species_Presences,
DownloadSpecies(data),
pattern = map(data),
iteration = "list"
),
tar_target(Join_and_Split,
combine_presences(Species_Presences),
Clean_Coord(Species_Presences),
pattern = map(Species_Presences)),
tar_target(Cleaned_Coordinates,
Clean_Coord(Join_and_Split),
pattern = map(Join_and_Split),
iteration = "group",
error = "null")
)
But then it tries to make the combine_presences for each data frame, which defeats the purpose
any help would be greatly apprecieated [1]: https://i.stack.imgur.com/gbBtN.png