Info: So I have 2 dataframes, we'll call them dfa and dfb. Dfa has a master list of things I'm interested in, and dfb has a list of things I want to check for in dfa if they already exist. I'm using a left join to carry over some information in a column from dfb (Note) to dfa based on whether or not the items in dfa existed in b already.
Goal: I want to create a column that says something like "True" or a numerical value if it is not included already in dfa. Or a value that denotes it exists in both files, either way is fine.
Use of dplyr preferred.
Example:
dfa <- data.frame(
ID = c(11,42,21,3,4),
Name = c("ab", "bc", "cd", "de","fg")
)
dfb <- data.frame(
ID = c(11,32,11,3),
Name = c("ab", "bb", "fd", "de"),
Note = c("blue","white","black","yellow")
)
join <- left_join(dfa, dfb, by = c("ID","Name")) %>%
mutate( new = case_when(dfa$ID %in% dfb$ID ~ "exists") )
This just makes a "new" col with NAs all through it. I tried another method:
join <- left_join(dfa, dfb, by = c("ID","Name")) %>%
mutate( new = case_when(dfa$ID == dfb$ID ~ "exists") )
This gave me an error about the length of the dfs because in real life the two dfs are not the same length long (rownum).
join <- left_join(dfa, dfb, by = c("ID","Name")) %>%
mutate( new = case_when(dfa$ID !%in% dfb$ID ~ "is new") )
Also gives me issues about df length.
Desired output would look like:
join:
ID Name new
11 "ab" "exists"
42 "bc"
21 "cd"
3 "de" "exists"
4 "fg"
32 "fd"
11 "de"
Or just 'is new' in the two bottom new columns. Same difference, help would be appreciated. Thanks!