For each sequence length we call freqs
which for each row of m
invokes rollapply
to get successive subsequences. ag
contains each subsequence along with its frequency and finally we omit subsequencies not having a minimum frequency of minFreq
to keep the size down.
In the last line of code we call freqs
successively with values of k
(the subsequence length) for 4, 3, 2 and 1 to get subsequences of those lengths. Change 4:1 to whatever you want. Also in that line omit minFreq=2
if you want all the frequencies and not just those that are at least 2. (We used at least 2 to keep the output size reasonable.)
library(plyr)
library(zoo)
freqs <- function(k, m, minFreq = 1) {
tuples <- if (k == 1) matrix(m)
else do.call("rbind", lapply(split(m, row(m)), rollapply, k, c))
ag <- aggregate(list(freq = 1:nrow(tuples)), as.data.frame(tuples), length)
subset(ag, freq >= minFreq)
}
do.call("rbind.fill", lapply(4:1, freqs, m, minFreq = 2))
giving:
V1 V2 V3 V4 freq
1 1 2 2 3 2
2 2 2 3 3 3
3 2 3 3 4 2
4 2 3 4 6 2
5 3 4 6 6 2
6 1 2 2 NA 2
7 1 2 3 NA 2
8 2 2 3 NA 4
9 2 3 3 NA 4
10 2 3 4 NA 3
11 3 3 4 NA 2
12 3 4 5 NA 2
13 3 4 6 NA 3
14 4 6 6 NA 2
15 7 7 7 NA 2
16 1 1 NA NA 2
17 1 2 NA NA 4
18 2 2 NA NA 4
19 2 3 NA NA 7
20 3 3 NA NA 4
21 3 4 NA NA 6
22 4 5 NA NA 2
23 4 6 NA NA 3
24 6 6 NA NA 3
25 6 7 NA NA 3
26 7 7 NA NA 4
27 1 NA NA NA 7
28 2 NA NA NA 11
29 3 NA NA NA 12
30 4 NA NA NA 6
31 5 NA NA NA 3
32 6 NA NA NA 8
33 7 NA NA NA 9
Note
In the question the input is called df
suggesting it is a data frame but the display of it in the question shows that it is, in fact, a matrix. For sake of reproducibility we use this matrix in our computations above:
m <- matrix(c(1L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 3L, 1L,
2L, 1L, 2L, 3L, 2L, 2L, 4L, 3L, 3L, 2L, 3L, 4L, 3L, 3L, 7L, 4L,
3L, 3L, 4L, 6L, 3L, 3L, 7L, 5L, 4L, 3L, 6L, 7L, 5L, 4L, 7L, 6L,
6L, 6L, 6L, 7L, 7L, 5L, 7L, 7L, 6L, 6L, 7L), 8)