5

Given a symmetric binary similarity matrix M (1 = similarity), I want to extract all (potentially overlapping) subsets, where all elements within a set are mutually similar.

  A B C D E
A 1 1 0 0 0
B 1 1 1 1 0
C 0 1 1 1 1
D 0 1 1 1 1
E 0 0 1 1 1

Also, sets contained within other sets should be discarded (e.g. {D,E} is contained in {C,D,E}). For the matrix the result would be: {A,B}, {B,C,D}, {C,D,E}

  1. How can I easily achieve this?
  2. I suspect that there is some clustering algorithm for this, but I am unaware of the name for these types of problems. To which (mathematical) class of problems does this task belong to?

Code

M <- matrix(c(1,1,0,0,0,
              1,1,1,1,0,
              0,1,1,1,1,
              0,1,1,1,1,
              0,0,1,1,1), ncol = 5, byrow = TRUE)
colnames(M) <- rownames(M) <- LETTERS[1:5]

PS. While this may smell like some homework assignment, but its actually a problem I encountered in my job :)

Mark Heckmann
  • 10,943
  • 4
  • 56
  • 88
  • 2
    does `igraph::max_cliques(g, min=2)` do what you want? – user20650 Nov 09 '19 at 20:45
  • 2
    Related: [Generating distinct groups of nodes in a network](https://stackoverflow.com/questions/55910373/generating-distinct-groups-of-nodes-in-a-network). You may start with `g <- graph_from_adjacency_matrix(M)` – Henrik Nov 09 '19 at 20:55
  • Yes, this appears to be exactly what I was looking for. Do you want to turn this into an answer? – Mark Heckmann Nov 09 '19 at 21:04

1 Answers1

4

A clique is a subgraph that is completely connected.

What you are looking for is hence (maximal) clique detection.

https://en.wikipedia.org/wiki/Clique_problem

Beware that the results can be much larger than you anticipate. Consider a graph where each edge is 1 with a probability of p. For p close to 1, almost any subset is a clique. Finding maximum cliques then becomes expensive. P can also be chosen to maximize the number of maximal cliques...

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194