0

The problem

Hello everyone, I am new to the targets package, and I am trying to make a workflow. For this workflow I am using extensively a package I am developing called SDMWorkflows that you can install from github using remotes:

remotes::install_github("Sustainscapes/SDMWorkflows")

I put the code and dataset here in this repo

What I am trying to do is download and clean and then model several species, for that I am using so far this functions that I added to the R folder:

Read_Data <- function(Data){
  readr::read_csv("Species.csv") |> 
    dplyr::pull(Species)
}


DownloadSpecies <- function(Cleaned){
 SDMWorkflows::GetOccs(Species = Cleaned,
                        continent = "europe",
                        limit = 1000, 
                        WriteFile = F)
}

Clean_Coord <- function(Coords){
  SDMWorkflows::clean_presences(Coords[[1]])
}

And using the following pipeline

tar_option_set(
  packages = c("SDMWorkflows", "readxl", "janitor", "dplyr", "purrr"),     format = "rds" # default storage format
)


tar_source()

list(
  tar_target(file, "Species.csv", format = "file"),
  tar_target(data, Read_Data(file)),
  tar_target(
    Species_Presences,
    DownloadSpecies(data),
    pattern = map(data),
    iteration = "list"
  ),
  tar_target(Cleaned_Coordinates,
             Clean_Coord(Species_Presences),
             pattern = map(Species_Presences),
             iteration = "group",
             error = "null")
)

The problem I have is that some of the Species_Presences that come out of the DownloadSpecies function are empty data frames, which I expect might happen to some of my 30.000 species I will test later. So I am trying to figure out how then I can keep going after this, because apparently setting error to "null" will not allow me to keep going in the workflow

[![enter image description here][1]][1]

What I have tried

First try

I tried adding a function called combine_presences in between, where I thought I could use from purrr::reduce(bind_rows) and then split again by species thus getting a new mapped object with only the species that still have presences before doing the coodrinate cleaning, the function looks like this:

combine_presences <- function(Species) {
  Species[[1]][[1]] |> 
    dplyr::select("key", "scientificName", "decimalLatitude", "decimalLongitude", "kingdom", "phylum", "order", "family", "genus", "species") |> 
    dplyr::group_split(species)
}

and modifying the targets code as follows:

list(
  tar_target(file, "Species.csv", format = "file"),
  tar_target(data, Read_Data(file)),
  tar_target(
    Species_Presences,
    DownloadSpecies(data),
    pattern = map(data),
    iteration = "list"
  ),
  tar_target(Join_and_Split,
             combine_presences(Species_Presences)),
  tar_target(Cleaned_Coordinates,
             Clean_Coord(Join_and_Split),
             pattern = map(Join_and_Split),
             iteration = "group",
             error = "null")
)

However there I only get one species of the whole set

Second try

I tried to map it like this

list(
  tar_target(file, "Species.csv", format = "file"),
  tar_target(data, Read_Data(file)),
  tar_target(
    Species_Presences,
    DownloadSpecies(data),
    pattern = map(data),
    iteration = "list"
  ),
  tar_target(Join_and_Split,
             combine_presences(Species_Presences),
             Clean_Coord(Species_Presences),
             pattern = map(Species_Presences)),
  tar_target(Cleaned_Coordinates,
             Clean_Coord(Join_and_Split),
             pattern = map(Join_and_Split),
             iteration = "group",
             error = "null")
)

But then it tries to make the combine_presences for each data frame, which defeats the purpose

any help would be greatly apprecieated [1]: https://i.stack.imgur.com/gbBtN.png

Derek Corcoran
  • 3,930
  • 2
  • 25
  • 54

0 Answers0