We can actually do this in a way that might at first seem a little weird, but is relatively straight forward. Let's start by working with just data frames a
and b
and let's simplify things a bit. Let's assume that the id
variable in both a
and b
are the following:
a_id <- 1:5
b_id <- 1:5
In this simple example, a_id
and b_id
are exactly identical. What we want to know is how many times each of the values in a_id
shows up in b_id
. We obviously know the answer is one time each, but how do we get R to tell us that? That's where the table
function can come in handy:
table(a_id, b_id)
# b_id
# a_id 1 2 3 4 5
# 1 1 0 0 0 0
# 2 0 1 0 0 0
# 3 0 0 1 0 0
# 4 0 0 0 1 0
# 5 0 0 0 0 1
That might look a little ugly, but you can see that we have our b_id
s on the top (1-5) and our a_id
s on the left-hand side. Down the diagonal, we see the counts for how many times each value of a_id
shows up in b_id
, and it's 1 each just like we already knew. So how do we get just that information? R has a nice function called diag
that gets the main diagonal for us:
diag(table(a_id, b_id))
# 1 2 3 4 5
# 1 1 1 1 1
And there we have it. A vector with our "countif" values. But what if b_id
doesn't have all of the values that are in a_id
? If we try to do what we just did, we'll get an error because table
doesn't like it when two vectors have different lengths. So we modify it a bit:
a_id <- 1:10
b_id <- 4:8
table(b_id[b_id %in% a_id])
# 4 5 6 7 8
# 1 1 1 1 1
A couple new things here. The use of %in%
just asks R to tell us if a value exists in a vector. For example, 1 %in% 1:3
would return TRUE
, but 4 %in% 1:3
would return FALSE
. Next, you'll notice that we indexed b_id
by using [
. This only returns of the values of b_id
where b_id %in% a_id
is TRUE
, which in this case is all of b_id
.
So what does this look like if we expect more than one value of each a_id
in b_id
, but not every a_id
value to be in b_id
? Let's look at a more real example:
a_id <- 1:10
b_id <- sample(3:7, 1000, replace=TRUE)
table(b_id[b_id %in% a_id])
# 3 4 5 6 7
# 210 182 216 177 215
Like I said, it might seem a little weird at first, but it's relatively straight forward. Hopefully this helps you more than it confuses you.