I want to join a data frame to another data frame first by one column, then, if there are no "matches" I want it to try to join by another column. The problem is similar to this question, but I'm trying to get a slightly different output.
Here are my 'observations'
#my df
my_plants <- data.frame(scientific_name = c("Abelmoschus esculentus",
"Abies balsamea",
"Ammophila breviligulata",
"Zigadenus glaucus"),
percent_cover = c(90, 80, 10, 60))
and here is the main list
with some data that I want to extract for each of my observations. Obviously this is simplified.
#hypothetical database
plant_database <- data.frame(scientific_name = c("Abelmoschus esculentus",
"Abies balsamea",
"Ammophila breviligulata",
"Anticlea elegans"),
synonym = c(NA_character_,
NA_character_,
NA_character_,
"Zigadenus glaucus"),
score = c(1, 1, 2, 6))
Here is a function to join my observations to the main list. Note: I'm using a left_join because I want to know which observations were not matched.
#joining function
joining_fun <- function(plants, database) {
database_long <- database %>%
dplyr::mutate(ID = row.names(.)) %>%
tidyr::pivot_longer(., cols = c(scientific_name, synonym),
values_to = "scientific_name")
join <- dplyr::left_join(plants, database_long, by = "scientific_name") %>%
dplyr::select(-name)
return(join)
}
Which gets me here:
joining_fun(my_plants, plant_database)
scientific_name percent_cover score ID
1 Abelmoschus esculentus 90 1 1
2 Abies balsamea 80 1 2
3 Ammophila breviligulata 10 2 3
4 Zigadenus glaucus 60 6 4
but I want something like this:
scientific_name synonym percent_cover score ID
Abelmoschus esculentus NA 90 1 1
Abies balsamea NA 80 1 2
Ammophila breviligulata NA 10 2 3
Anticlea elegans Zigadenus glaucus 60 6 4
Thanks!