Similar questions have been asked here and here. However, they don't solve my specific problem.
I am trying to count pairs of observations in a data frame, where the data frame is grouped by two other variables.
For example, if I have a data frame like the one below:
library(dplyr)
set.seed(100)
dft <- data.frame(
var = sample(LETTERS[1:5], 10, replace = TRUE),
num = c(1,1,2,1,1,1,2,1,1,1),
iter = c(1,1,1,2,2,2,2,3,3,3)
)
dft <- dft %>%
group_by(iter, num)
> dft
var num iter
1 B 1 1
2 C 1 1
3 A 2 1
4 B 1 2
5 D 1 2
6 D 1 2
7 B 2 2
8 C 1 3
9 B 1 3
10 E 1 3
In my example, the pairs are counted as the observation and the one preceding it. E.g., if we have something like B,C,B,A
in one grouping, the pairs would be: B:C
, C:B
and B:A
We can see that the pair B:C
appears once when iter == 1
and num == 1
.
The pairs B:D
and D:D
appear once each when iter == 2
and num == 1
and
The pairs C:B
and B:E
appear once each when iter == 3
and num == 1
.
I was thinking of doing something like this:
g1 <- expand.grid(dft$var, sort(dft$var), iter = dft$iter)
g1$count <- NA
But filling the g1$count
column with how many times they appear. However, I cant figure out a way to actually count the pairs by the groups?
Additionally, reversed pairings are not equivalent. For example, in my example, the pair B:E
is not equivalent to the pair E:B
Any suggestions as to how I can count these pairs?