2

I want to find the number of matches based on ID of unique matches within multiple data.frames

Data looks like this:

df1: KeyID
       x
       x
       y
       y
       z

df2: KeyID
       x
       x
       x
       z
       z

df3: KeyID
       x
       y
       y
       z

I want to count the number of unique matches between data frames.

output would look like this: 2

Since x and z are the only matches between the two sets.

I have done this but want to know if there is a faster way:

df1.2 <- df2[df2$KeyID %in% df1$KeyID,]
length(unique(df1.2$KeyID))

Any thoughts?

CoryB
  • 75
  • 1
  • 3
  • 11
  • 2
    Wouldn't this just be `sum(unique(df1$KeyID) %in% unique(df2$KeyID)))`? Actually now that I think about it more, you might need to wrap `as.character` around them if they're factors. – IRTFM Jul 30 '14 at 19:42
  • @BondedDust only if position doesn't matter... as it stands now, it is hard to tell what the asker wants... – Justin Jul 30 '14 at 19:46
  • Position does not matter...just want to know how many of KeyIDs are in both data frames, or in dfs 1,2,3 and so on. – CoryB Jul 30 '14 at 19:48
  • You mentioned multiple datasets/vectors.If more than two: `Reduce("intersect", listofvectors)` – akrun Jul 30 '14 at 19:56
  • @akrun Is reduce part of a package? – CoryB Jul 30 '14 at 20:06
  • @CoryB It's in the base package, but be careful of capitalization, the command is `Reduce`. – tkmckenzie Jul 30 '14 at 20:08
  • @tkmckenzie I tried Reduce however it is not happy : Reduce("intersect",PCS$KeyID,Spr.1104$KeyID) – CoryB Jul 30 '14 at 20:12
  • @CoryB You need to put the vectors in a list first, see the edit to my answer below. – tkmckenzie Jul 30 '14 at 20:14
  • @tkmckenzie Fixed it! length(Reduce("intersect",list(PCS$KeyID,Spr.1104$KeyID))) – CoryB Jul 30 '14 at 20:14

1 Answers1

7

You can do set intersection with intersect:

v1 <- c("x", "x", "y", "y", "z")
v2 <- c("x", "x", "x", "z", "z")
intersect(v1, v2)
# [1] "x" "z"
length(intersect(v1, v2))
# [1] 2

Edit: Adapting for the question edit, as per akrun's suggestion, if there are multiple vectors,

v1 <- c("x", "x", "y", "y", "z")
v2 <- c("x", "x", "x", "z", "z")
v3 <- c("x", "y", "y", "z")
vector.list <- list(v1, v2, v3)

Reduce("intersect", vector.list)
# [1] "x" "z"
tkmckenzie
  • 1,353
  • 1
  • 10
  • 19