R: Counting the number of matches between multiple data frames

Question

I want to find the number of matches based on ID of unique matches within multiple data.frames

Data looks like this:

df1: KeyID
       x
       x
       y
       y
       z

df2: KeyID
       x
       x
       x
       z
       z

df3: KeyID
       x
       y
       y
       z

I want to count the number of unique matches between data frames.

output would look like this: 2

Since x and z are the only matches between the two sets.

I have done this but want to know if there is a faster way:

df1.2 <- df2[df2$KeyID %in% df1$KeyID,]
length(unique(df1.2$KeyID))

Any thoughts?

Wouldn't this just be `sum(unique(df1$KeyID) %in% unique(df2$KeyID)))`? Actually now that I think about it more, you might need to wrap `as.character` around them if they're factors. — IRTFM, Jul 30 '14 at 19:42
@BondedDust only if position doesn't matter... as it stands now, it is hard to tell what the asker wants... — Justin, Jul 30 '14 at 19:46
Position does not matter...just want to know how many of KeyIDs are in both data frames, or in dfs 1,2,3 and so on. — CoryB, Jul 30 '14 at 19:48
You mentioned multiple datasets/vectors.If more than two: `Reduce("intersect", listofvectors)` — akrun, Jul 30 '14 at 19:56
@CoryB It's in the base package, but be careful of capitalization, the command is `Reduce`. — tkmckenzie, Jul 30 '14 at 20:08
@tkmckenzie I tried Reduce however it is not happy : Reduce("intersect",PCS$KeyID,Spr.1104$KeyID) — CoryB, Jul 30 '14 at 20:12
@CoryB You need to put the vectors in a list first, see the edit to my answer below. — tkmckenzie, Jul 30 '14 at 20:14
@tkmckenzie Fixed it! length(Reduce("intersect",list(PCS$KeyID,Spr.1104$KeyID))) — CoryB, Jul 30 '14 at 20:14

tkmckenzie · Accepted Answer · 2014-07-30T20:14:18.193

7

You can do set intersection with intersect:

v1 <- c("x", "x", "y", "y", "z")
v2 <- c("x", "x", "x", "z", "z")
intersect(v1, v2)
# [1] "x" "z"
length(intersect(v1, v2))
# [1] 2

Edit: Adapting for the question edit, as per akrun's suggestion, if there are multiple vectors,

v1 <- c("x", "x", "y", "y", "z")
v2 <- c("x", "x", "x", "z", "z")
v3 <- c("x", "y", "y", "z")
vector.list <- list(v1, v2, v3)

Reduce("intersect", vector.list)
# [1] "x" "z"

edited Jul 30 '14 at 20:14

answered Jul 30 '14 at 19:46

tkmckenzie

1,353
1
10
19

Again, only if position doesn't matter. – Justin Jul 30 '14 at 19:47
@CoryB See akrun's comment above. – tkmckenzie Jul 30 '14 at 20:07

R: Counting the number of matches between multiple data frames

1 Answers1

Linked