group by text in columns, look for common entries in two data frames

Question

I'm trying to compare columns from two data frames to extract items that appear in both. Specifically:

df1:
 state group species
1 CA 2 cat, dog, chicken, mouse
2 CA 1 cat
3 NV 1 dog, chicken
4 NV 2 chicken
5 WA 1 chicken, rat, mouse, lion
6 WA 2 dog, cat
7 WA 3 dog, chicken
8 WA 4 cat, chicken

df2:
 state special_species
1 CA cat
2 CA chicken
3 CA mouse
4 WA cat
5 WA chicken
6 NV dog

I'm interested in determining which of the special_species from df2 are present in df1. I'd like a new dataframe that has the state, group, and the special species. I think it should be a combination of a join, group_by, and summarize, but I can't seem to get that to work.

akrun · Answer 1 · 2019-12-03T00:53:51.507

We can split the 'species' column in 'df1' by , with separate_rows and then do a join

library(tidyr)
library(dplyr)
separate_rows(df1, species) %>%
    select(-group) %>%
    distinct %>%
    intersect(set_names(df2, c('state', 'species')))
#  state species
#1    CA     cat
#2    CA chicken
#3    CA   mouse
#4    NV     dog
#5    WA chicken
#6    WA     cat

Or do a inner_join

separate_rows(df1, species) %>%
   select(-group) %>%
   distinct %>%
    inner_join(df2, by = c('state', 'species' = 'special_species'))

data

df1 <- structure(list(state = c("CA", "CA", "NV", "NV", "WA", "WA", 
"WA", "WA"), group = c(2L, 1L, 1L, 2L, 1L, 2L, 3L, 4L), species = c("cat, dog, chicken, mouse", 
"cat", "dog, chicken", "chicken", "chicken, rat, mouse, lion", 
"dog, cat", "dog, chicken", "cat, chicken")), 
class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8"))

df2 <- structure(list(state = c("CA", "CA", "CA", "WA", "WA", "NV"), 
    special_species = c("cat", "chicken", "mouse", "cat", "chicken", 
    "dog")), class = "data.frame", row.names = c("1", "2", "3", 
"4", "5", "6"))

group by text in columns, look for common entries in two data frames

1 Answers1

data