What's the best algorithm for extracting all unique, complete subgraphs from an undirected graph of perhaps 1,024 nodes?

Question

I ask this question with apologies for my obvious mathematical shortcomings as a practical programmer. It's been more than 40 years since I did well in high-school algebra and then failed at anything higher. The concept of "NP-complete" and "NP-hard" problems has been difficult to grasp, but I've tried. I even bought and studied what appears to be the original guide to this class of problems, Computers and Intractability: A Guide to the Theory of NP-Completeness by Michael R. Garey and David S. Johnson.

https://goodreads.com/book/show/284369.Computers_and_Intractability/

Be that it may. It is to be hoped the question itself is clear enough. What's the best publicly available brute-force algorithm (with efficient branch pruning) thus far for the specific problem of extracting all unique, complete subgraphs (within which all distinct nodes (vertices) are connected to each other (over unique edges)) from any random undirected graph? That is to say, the algorithm should be able to first extract the largest unique, complete subgraphs, however many there might be, and then in that order extract all smaller unique, complete subgraphs that are (by definition, I think) not encompassed by any larger unique, complete subgraphs, thus avoiding the duplicative production of non-unique (implied) results.

Ouch, trying to spell it out in clear English like that makes my head hurt a bit. It is to be hoped that this description is nonetheless straightforward enough. A standard C/C++/(or even Python) library to provide this functionality with reasonable computational resources such as a Ryzen 5 3600 box with 64GB/128GB of DRAM would be great, especially if a complete analysis thereof with 1,024 nodes could be finished within a day or two, but I'll take what I can get with many thanks.

And if there's a FAQ or essay somewhere on the Web that covers this topic in English that can be understood by a non-mathematician, then that'd be even better!

Edit: The language in the following paper is admittedly a little over my head, but for you computational mathematicians out there, can you confirm that it does in fact substantially address the core problem itself? If so, I can begin a heroic effort to understand this "Bron–Kerbosch" algorithm with faith that it's the correct track to follow. -_-

"The worst-case time complexity for generating all maximal cliques and computational experiments" by Etsuji Tomita, Akira Tanaka, and Haruhisa Takahashi

(The University of Electro-Communications, Department of Information and Communication Engineering, Chofugaoka 1-5-1, Chofu, Tokyo 182-8585, Japan) (Toyota Techno Service Corporation, Imae 1-21, Hanamotocho, Toyota, Aichi 470–0334, Japan)

https://snap.stanford.edu/class/cs224w-readings/tomita06cliques.pdf

Hello @owlsupport, and welcome to Stack Overflow! I can quickly confirm that the paper you link is a good approach. It gives an algorithm for finding all maximal cliques (which are complete subgraphs that are not subsumed by larger subgraphs, as you desire) and does so in proven optimal asymptotic time in the worst case. However, your graph is likely not the worst case, in which case there may be more recent faster approaches. Could you perhaps give some more information about the graph? Specifically, the amount of edges may be very important, as sparse graphs are much easier for this problem. — ADdV, Jan 28 '21 at 13:43
If the graph in question is not sensitive information, you might also consider simply uploading the graph somewhere. — ADdV, Jan 28 '21 at 13:44
Thank you very much, ADdV! That paper is looking more and more interesting! The complex meanings therein are starting to soak into my cognitive horizon. It makes perfect sense that a sparse graph would with proper pruning heuristics impose a much lesser load on limited computational resources. Truthfully, I'm more interested here in understanding how to write such code myself for smaller instances of the classic clique problem (as seen in the aforementioned paper). This will pave the way for more complex datasets that address certain real-world problems. — owlsupport, Jan 28 '21 at 17:22
BTW, the figure of 1,024 nodes reflects a rough-and-ready estimate of the likely size of such practical problems. I picked that number in part because it's large enough to be interesting while keeping a lid on the beefy datasets produced thereby — a maximal clique of up to 1,024 nodes can, if I'm not mistaken, be more or less efficiently represented as a bit array that fits in 128 bytes of DRAM or hard drive storage, not counting overhead. — owlsupport, Jan 28 '21 at 17:28

score 1 · Answer 1 · answered Jan 28 '21 at 13:47

1

Yes, Bron--Kerbosch is what you want. There's an implementation in NetworkX, some readable pseudocode on Wikipedia if you know your set operators, and a Python implementation by yours truly and many more discoverable by searching.

answered Jan 28 '21 at 13:47

David Eisenstat

64,237
7
60
120

Ah, thank you! Just knowing the right path to follow is enormously helpful. It's been a while, but set operators are slowly returning to my memory. I'll check out your undoubtedly excellent Python implementation! ^_^ – owlsupport Jan 28 '21 at 17:09
@owlsupport I forgot that one was actually maximum clique, but it works by enumerating all maximal cliques with pruning, so you can just toss the pruning parts (or use someone else's; this isn't exactly a complicated algorithm) – David Eisenstat Jan 28 '21 at 21:59
Thank you, David Eisenstat — I think those terms becoming clear to me as well. The maximum clique (or cliques) is (are) a subset of all maximal cliques, yes? Can either be happy with the maximum clique (or cliques) and halt further processing or else continue with added pruning to avoid evoking subsets of existing maximal cliques, yes? Sure hope I've gotten it right at last. ^_^ – owlsupport Jan 31 '21 at 15:46
@owlsupport Yes. – David Eisenstat Jan 31 '21 at 15:54
1

BTW, as an arguably irrelevant aside, I've been contemplating the essential nature of the so-called class of "NP-complete" problems. I think the clique problem as applied to a undirected graph is illustrative thereby in showing the deceptive conceptual simplicity of extracting all maximal cliques. Yes, an undirected graph with 2,048 vertexes contains only twice as many vertexes as an undirected graph with 1,024 vertexes, but the resulting possible connecting edges lead to a combinatorial explosion of computational complexity. Thus the very interesting question of whether or not "P=NP". :-) – owlsupport Jan 31 '21 at 16:19
Ugh, that last bit didn't come out right, did it? Let's see ...doubling the number of vertexes in, say, an undirected graph with 1,024 vertexes results in (approximately) quadrupling the number of possible edges (in an undirected graph with 2,048 vertexes). Doubling that again to 4,096 vertexes results again in (approximately) quadrupling the number of possible edges. I still suck at mathematics, but the number of possible edges is quite close to increasing as the square of the difference in the number of vertexes, right? That's exponential growth, isn't it? O_o – owlsupport Jan 31 '21 at 19:24
@owlsupport yes, definitely exponential. It's still often possible to enumerate all cliques in graphs of that size if they have sufficiently few edges relative to vertices though. – David Eisenstat Jan 31 '21 at 19:28
Yah, that makes perfect sense — there's a staggering difference between the potential computational complexity of a heavily populated graph with edges teeming everywhere and of a sparse graph populated only by a light scattering of lonely edges. If I understand correctly at a conceptual level the likely practical effects of efficient pruning heuristics, adding more constraints such as directional edges has the almost paradoxical result of decreasing the computational load as patently impossible paths to a desired solution are thereby weeded out more quickly in any given analytical subtree. :-) – owlsupport Jan 31 '21 at 20:02

What's the best algorithm for extracting all unique, complete subgraphs from an undirected graph of perhaps 1,024 nodes?

1 Answers1