Given a symmetric binary similarity matrix M
(1
= similarity), I want to extract all (potentially overlapping) subsets, where all elements within a set are mutually similar.
A B C D E
A 1 1 0 0 0
B 1 1 1 1 0
C 0 1 1 1 1
D 0 1 1 1 1
E 0 0 1 1 1
Also, sets contained within other sets should be discarded (e.g. {D,E}
is contained in {C,D,E}
). For the matrix the result would be: {A,B}
, {B,C,D}
, {C,D,E}
- How can I easily achieve this?
- I suspect that there is some clustering algorithm for this, but I am unaware of the name for these types of problems. To which (mathematical) class of problems does this task belong to?
Code
M <- matrix(c(1,1,0,0,0,
1,1,1,1,0,
0,1,1,1,1,
0,1,1,1,1,
0,0,1,1,1), ncol = 5, byrow = TRUE)
colnames(M) <- rownames(M) <- LETTERS[1:5]
PS. While this may smell like some homework assignment, but its actually a problem I encountered in my job :)