1

I've been looking for packages using which I could create subgraphs with overlapping vertices. From what I understand in Networkx and metis one could partition a graph into two or multi-parts. But I couldn't find how to partition into subgraphs with overlapping nodes.

Suggestions on libraries that support partitioning with overlapping vertices will be really helpful.

EDIT: I tried the angel algorithm in CDLIB to partition the original graph into subgraphs with 4 overlapping nodes.

import networkx as nx
from cdlib import algorithms
   
if __name__ == '__main__':
 
    g = nx.karate_club_graph()

    coms = algorithms.angel(g, threshold=4, min_community_size=10)
    print(coms.method_name)
    print(coms.method_parameters)  # Clustering parameters)
    print(coms.communities)
    print(coms.overlap)
    print(coms.node_coverage)

Output:

ANGEL
{'threshold': 4, 'min_community_size': 10}
[[14, 15, 18, 20, 22, 23, 27, 29, 30, 31, 32, 8], [1, 12, 13, 17, 19, 2, 21, 3, 7, 8], [14, 15, 18, 2, 20, 22, 30, 31, 33, 8]]
True
0.6470588235294118

From the communities returned, I understand 1 and 3 have an overlap of 4 nodes but 2 and 3 or 1 and 3 don't have an overlap size of 4 nodes. It is not clear to me how the overlap threshold (4 overlaps) has to be specified here algorithms. angel(g, threshold=4, min_community_size=10). I tried setting threshold=4 here to define an overlap size of 4 nodes. However, from the documentation available for angel

:param threshold: merging threshold in [0,1].

I am not sure how to translate the 4 overlaps to the value that has to be set between the bounds [0, 1]. Suggestions will be really helpful.

Natasha
  • 1,111
  • 5
  • 28
  • 66
  • Have you checked the [GitHub repo](https://github.com/GiulioRossetti/ANGEL/blob/master/angel/alg/iAngel.py) for `threshold > 1`? – J. M. Arnold Jan 08 '21 at 10:35

1 Answers1

1

You can check out CDLIB:

They have a great amount of community finding algorithms applicable to networkX, including some overlapping communities algorithms.

Specifically about the angel algorithm in CDLIB:

According to ANGEL: efficient, and effective, node-centric community discovery in static and dynamic networks, the threshold is not the overlapping threshold, but used as follows:

If the ratio is greater than (or equal to) a given threshold, the merge is applied and the node label updated.

  • Basically, this value determines whether to further merge the nodes into bigger communities, and is not equivalent to the number of overlapping nodes.

  • Also, don't mistake "labels" with "node's labels" (as in nx.relabel_nodes(G, labels)). The "labels" referred are actually correlated with the Label Propagation Algorithm which is used by ANGEL.

As for the effects of varying this threshold:

[...] Increasing the threshold, we obtain a higher number of communities since lower quality merges cannot take place.

[based on the comment by @J. M. Arnold]
From ANGEL's github repository you can see that when threshold >= 1 only the min_comsize value is used:

self.threshold = threshold

if self.threshold < 1:
    self.min_community_size = max([3, min_comsize, int(1. / (1 - self.threshold))])
else:
    self.min_community_size = min_comsize
willcrack
  • 1,794
  • 11
  • 20
  • My best guess would be `4/len(G)` but I never used that algorithm before – willcrack Jan 02 '21 at 23:08
  • Thank you, I tried 4/len(g.nodes)), the algorithm returned only 1 community `[[0, 1, 10, 12, 13, 14, 15, 16, 17, 18, 19, 2, 20, 21, 22, 23, 26, 27, 28, 29, 3, 30, 31, 32, 33, 4, 5, 6, 7, 8]` with 0.88 node coverage. Could you please suggest if there are other algorithms that I could try from CDLIB for creating subgraphs with overlap size of 4. – Natasha Jan 03 '21 at 03:28
  • I’m sorry, but I’m not an expert on CDLIB I just knew it because I tried to use their Leiden algorithm a few weeks ago. – willcrack Jan 03 '21 at 11:04
  • No issues. Thanks a lot for trying to help me out :) – Natasha Jan 03 '21 at 11:10
  • 1
    Thank you, I somehow missed the edit. "the merge is applied and the node label updated." so the nodes returned in the `communities` don't correspond to the node labels in the original graph? Could you please clarify? – Natasha Jan 08 '21 at 02:45
  • 1
    That value determines whether to further merge the nodes into bigger communities, and not the number of overlapping nodes – willcrack Jan 08 '21 at 10:56
  • Sry, about you question, I believe the labels refered have something to do with the [Label propagation algorithm](https://en.wikipedia.org/wiki/Label_propagation_algorithm) which is used by [ANGEL](https://appliednetsci.springeropen.com/articles/10.1007/s41109-020-00270-6) – willcrack Jan 08 '21 at 12:30