I am trying to find out if 3 datasets (df1, df2, df3) have any common rows (i.e. entire row is a duplicate).
I figured out how to do this for pairs of 2 datasets:
df1 = data.frame(id = c(1,2,3), names = c("john", "alex", "peter"))
df2 = data.frame(id = c(1,2,3), names = c("alex", "john", "peter"))
df3 = data.frame(id = c(1,2,3), names = c("peter", "john", "tim"))
library(dplyr)
inner_join(df1, df2)
inner_join(df1, df3)
inner_join(df2, df3)
- Is it possible to do this for 3 datasets all at once?
The straightforward way does not seem to work:
inner_join(df1, df2, df3)
Error in `[.data.frame`(by, c("x", "y")) : undefined columns selected
I thought I had found a way to do this:
library(plyr)
join_all(list(df1, df2, df3), type='inner')
But this is telling me that there are no common rows (i.e. same id, same name) between these 3 dataframes:
Joining by: id, names
Joining by: id, names
[1] id names
<0 rows> (or 0-length row.names)
This is not correct, seeing as in the example I created:
- Row 3 for df1 and df2 are identical (id = 3,name = peter)
- Row 2 for df2 and df3 are identical (id = 2, name= john)
I am trying to find a way to determine if these 3 datasets share any common rows. Can this be done in R?
Thank you!