Count pairs by groups in R?

Question

Similar questions have been asked here and here. However, they don't solve my specific problem.

I am trying to count pairs of observations in a data frame, where the data frame is grouped by two other variables.

For example, if I have a data frame like the one below:

library(dplyr)

set.seed(100)
dft <- data.frame(
  var = sample(LETTERS[1:5], 10, replace = TRUE),
  num = c(1,1,2,1,1,1,2,1,1,1),
  iter = c(1,1,1,2,2,2,2,3,3,3)
)

dft <- dft %>% 
  group_by(iter, num)

> dft
   var num iter
1    B   1    1
2    C   1    1
3    A   2    1
4    B   1    2
5    D   1    2
6    D   1    2
7    B   2    2
8    C   1    3
9    B   1    3
10   E   1    3

In my example, the pairs are counted as the observation and the one preceding it. E.g., if we have something like B,C,B,A in one grouping, the pairs would be: B:C, C:B and B:A

We can see that the pair B:C appears once when iter == 1 and num == 1.

The pairs B:D and D:D appear once each when iter == 2 and num == 1 and

The pairs C:B and B:E appear once each when iter == 3 and num == 1.

I was thinking of doing something like this:

g1 <- expand.grid(dft$var, sort(dft$var), iter = dft$iter)
g1$count <- NA

But filling the g1$count column with how many times they appear. However, I cant figure out a way to actually count the pairs by the groups?

Additionally, reversed pairings are not equivalent. For example, in my example, the pair B:E is not equivalent to the pair E:B

Any suggestions as to how I can count these pairs?

The df is grouped because I need the pairs for each `iter`. Eg, `B:C` might appear in `iter = 1` and `iter = 2`. But I cant count `B:C` twice here because it appears in separate `iter`s. — Electrino, Feb 25 '22 at 19:07
How do you guarantee it's always a pair? I.e. what if there are 3 entries within one grouping? — Dan Adams, Feb 25 '22 at 19:08
adding an expected output for this example is kind of unfeasible. For example, when I use `expand.grid` I have around 750 rows to fill in So, it's hard to manually create an expected output. — Electrino, Feb 25 '22 at 19:08
In my example there are 3 entries in one grouping. But the pairs are always counted as the observation and the one preceding it. Sorry, I'll edit my question to include that — Electrino, Feb 25 '22 at 19:10

score 3 · Answer 1 · answered Feb 25 '22 at 19:20

Here is a base R solution.

set.seed(100)
dft <- data.frame(
  var = sample(LETTERS[1:5], 10, replace = TRUE),
  num = c(1,1,2,1,1,1,2,1,1,1),
  iter = c(1,1,1,2,2,2,2,3,3,3)
)

sp <- split(dft$var, list(dft$num, dft$iter))
res <- lapply(sp, \(x){
  table(paste(x[-length(x)], x[-1], sep = ":"))
})
res <- res[sapply(res, nrow) > 0L]
res <- lapply(seq_along(res), \(i){
  nms <- strsplit(names(res)[i], "\\.")[[1]]
  dat <- as.data.frame(res[[i]], responseName = "count")
  cbind(dat, num = nms[1], iter = nms[2])
})
res <- do.call(rbind, res)
res
#>   Var1 count num iter
#> 1  B:C     1   1    1
#> 2  B:D     1   1    2
#> 3  D:D     1   1    2
#> 4  B:E     1   1    3
#> 5  C:B     1   1    3

^{Created on 2022-02-25 by the reprex package (v2.0.1)}

shouldn't `B:D` appear twice? ie `B, D, D` Ie B and the first D then B and the second D — Onyambu, Feb 25 '22 at 19:24
@Onyambu No, not according to the question. (2nd line of the expected counts). Pairs are consecutive vector elements. — Rui Barradas, Feb 25 '22 at 19:26

score 3 · Accepted Answer · answered Feb 25 '22 at 19:36

dft %>%
  group_by(iter, num) %>%
  summarise(nn = paste(var, lead(var, default = ''), sep = ':'),
            .groups = 'keep') %>%
  count(nn) %>%
  filter(str_detect(nn, '.:.'))

# A tibble: 5 x 4
# Groups:   iter, num [3]
   iter   num nn        n
  <dbl> <dbl> <chr> <int>
1     1     1 B:C       1
2     2     1 B:D       1
3     2     1 D:D       1
4     3     1 B:E       1
5     3     1 C:B       1

Count pairs by groups in R?

2 Answers2