Given your comment about wanting to match words, here is the start of a text approach that might be helpful. Basically we want to split out each word and count the occurrence in each statement.
library(tidytext)
library(dplyr)
library(tidyr)
dtm <- data %>%
unnest_tokens("word", "animal", token = "regex", pattern = ",") %>%
mutate(word = str_trim(word)) %>%
count(group, word) %>%
pivot_wider(names_from = "word", values_from = "n", values_fill = list(n = 0))
What you know have is a document term matrix. We have now changed your problem from a regex matching one to finding the vectors with the most matches.
# A tibble: 4 x 7
group cat dog horse mouse cow frog
<dbl> <int> <int> <int> <int> <int> <int>
1 1 1 1 1 1 0 0
2 2 1 1 1 0 0 0
3 3 1 1 0 0 0 0
4 4 1 1 0 0 1 1
An easy thing to do would be to extract the matrix part and just multiply.
mat <- as.matrix(select(dtm, -group))
matches <- (mat %*% t(mat))
This would give you a matrix for each group matches. For example, row 1, column 2 shows the three word matches (cat, dog, and horse) between groups 1 and 2.
matches
[,1] [,2] [,3] [,4]
[1,] 4 3 2 2
[2,] 3 3 2 2
[3,] 2 2 2 2
[4,] 2 2 2 4
Then you can play with things from there. For example, pulling the row and column IDs and then the upper triangular part of the matrix can give you a summary. I think from here it is just a matter of how you want to filter the table.
data.frame(row = c(row(matches)),
col = c(col(matches)),
value = c(matches),
upper = c(upper.tri(matches))) %>%
filter(upper == TRUE)
row col value upper
1 1 2 3 TRUE
2 1 3 2 TRUE
3 2 3 2 TRUE
4 1 4 2 TRUE
5 2 4 2 TRUE
6 3 4 2 TRUE