0

I'm trying to compare columns from two data frames to extract items that appear in both. Specifically:

df1:
 state group species
1 CA 2 cat, dog, chicken, mouse
2 CA 1 cat
3 NV 1 dog, chicken
4 NV 2 chicken
5 WA 1 chicken, rat, mouse, lion
6 WA 2 dog, cat
7 WA 3 dog, chicken
8 WA 4 cat, chicken

df2:
 state special_species
1 CA cat
2 CA chicken
3 CA mouse
4 WA cat
5 WA chicken
6 NV dog

I'm interested in determining which of the special_species from df2 are present in df1. I'd like a new dataframe that has the state, group, and the special species. I think it should be a combination of a join, group_by, and summarize, but I can't seem to get that to work.

nso921
  • 31
  • 3

1 Answers1

2

We can split the 'species' column in 'df1' by , with separate_rows and then do a join

library(tidyr)
library(dplyr)
separate_rows(df1, species) %>%
    select(-group) %>%
    distinct %>%
    intersect(set_names(df2, c('state', 'species')))
#  state species
#1    CA     cat
#2    CA chicken
#3    CA   mouse
#4    NV     dog
#5    WA chicken
#6    WA     cat

Or do a inner_join

separate_rows(df1, species) %>%
   select(-group) %>%
   distinct %>%
    inner_join(df2, by = c('state', 'species' = 'special_species'))

data

df1 <- structure(list(state = c("CA", "CA", "NV", "NV", "WA", "WA", 
"WA", "WA"), group = c(2L, 1L, 1L, 2L, 1L, 2L, 3L, 4L), species = c("cat, dog, chicken, mouse", 
"cat", "dog, chicken", "chicken", "chicken, rat, mouse, lion", 
"dog, cat", "dog, chicken", "cat, chicken")), 
class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8"))

df2 <- structure(list(state = c("CA", "CA", "CA", "WA", "WA", "NV"), 
    special_species = c("cat", "chicken", "mouse", "cat", "chicken", 
    "dog")), class = "data.frame", row.names = c("1", "2", "3", 
"4", "5", "6"))
akrun
  • 874,273
  • 37
  • 540
  • 662