1
df1 = data.frame(Id = c(1:6), Product = c(rep("Toaster", 3), rep("Radio", 3)), Test = NA)
df2 = data.frame(Id = c(2, 4, 6, 7), State = c(rep("Alabama", 2), rep("Ohio", 2)))

df_sum <- anti_join(df1,df2, by = "Id") %>% bind_rows(df2)

Is there an easy way to make the result of the anti_join so that only rows of df1 are present in the result. So for example the row with the Id 7 is dropped in the resulting dataframe. I can only think of laborious solutions. Thanks in advance.

The resulting table should look like this:

Result

camille
  • 16,432
  • 18
  • 38
  • 60
BroNKo
  • 39
  • 7
  • Just remove the `%>% bind_rows(df2)` part, or I don't understand the question. `anti_join(df1,df2, by = "Id")` will already give you only the lines of `df1` which are not in `df2`. If this isn't the result you expect, could you rephrase the question or provide your expected result. – Alexandre Léonard Aug 27 '21 at 18:36
  • Why did you bind df2 onto your data if you don't want it there? – camille Aug 27 '21 at 18:41
  • I'm guessing you actually want this result: `merge(df1, df2, all.x = T)` – Alexandre Léonard Aug 27 '21 at 18:41
  • This line will give you what you want: `anti_join(df1,df2, by = "Id") %>% bind_rows(df2) %>% filter(Id %in% df1$Id)` – Alexandre Léonard Aug 27 '21 at 18:45

3 Answers3

0

In your desirable output all rows and columns from both datasets are retained.

That is why I guess full_join might help you.

full_join return all rows and all columns from both x and y. Where there are not matching values, returns NA for the one missing.

library(tidyverse)
new_df <- df1 %>% full_join(df2)

anti_join function retains only those rows from df1 that do not intersect with rows in df2.

rg4s
  • 811
  • 5
  • 22
  • It is not true that "In your desirable output all rows and columns from both datasets are retained." There is no row with Id = 7 in the desired output. – Alexandre Léonard Aug 27 '21 at 18:49
0

That's a strange result that you expect but I don't know what your plan is, so this line of code will give you exactly the table you want:

anti_join(df1,df2, by = "Id") %>% bind_rows(df2) %>% filter(Id %in% df1$Id)

I've just added a filter so that only the Ids present inf df1 are retained.

0

Let's join only:

library(dplyr)
x <- df1 %>% 
    anti_join(df2, by = "Id") 

y <- df2 %>%  
    semi_join(df1, by = "Id") 

x %>% 
    full_join(y, by="Id")
 Id Product Test   State
1  1 Toaster   NA    <NA>
2  3 Toaster   NA    <NA>
3  5   Radio   NA    <NA>
4  2    <NA>   NA Alabama
5  4    <NA>   NA Alabama
6  6    <NA>   NA    Ohio
TarJae
  • 72,363
  • 6
  • 19
  • 66