6

I have a python code which uses igraph library

import igraph
edge =  [(0, 6), (0, 8), (0, 115), (0, 124), (0, 289), (0, 359), (0, 363), (6, 60), (6, 115), (6, 128), (6, 129), (6, 130), (6, 131), (6, 359), (6, 529), (8, 9), (8, 17), (8, 115)]
G = igraph.Graph(edges=edge, directed=False)
G.vs['label'] = nodes
G.es["weight"] = weights
dendrogram = G.community_edge_betweenness()
clusters = dendrogram.as_clustering()
membership = clusters.membership
out = pd.Series(membership, index=nodes)

and I need to convert it to networkx library.

import networkx as nx
G = nx.Graph(edges)
dendrogram = nx.edge_betweenness_centrality(G)
clusters = nx.clustering(dendrogram)
membership = clusters.membership
out = pd.Series(membership, index=nodes)

However, dendrogram cannot be clustered in networkx library. Can someone help in replicating the igraph code to networkx clusters?

Szabolcs
  • 24,728
  • 9
  • 85
  • 174
Dgstah
  • 181
  • 2
  • 11
  • I'm not very familiar with `igraph`, so could I ask you to explain what `as_clustering()` does? From looking at the documentation, it's not at all like the `nx.clustering` command (which returns the clustering coefficient for each node) – Joel May 19 '19 at 10:53
  • igraph has not been discontinued, this is simply not true. – Szabolcs Aug 15 '19 at 08:21

1 Answers1

3

The problem is that "clustering" refers to two different things in network science. It either refers to the clustering coefficient (fraction of triangles in the ego graph; nx.clustering) or it refers to a group of nodes (a.k.a. data clustering, network community, node partition, etc).

In this case you are using igraph community_edge_betweenness() to hierarchically cluster your nodes, and then cut the dendrogram to create a node partition through dendrogram.as_clustering().

The equivalent in networkx would be to use girvan_newman:

from networkx.algorithms.community.centrality import girvan_newman

nx_dendrogram = girvan_newman(G)
move_down_dendrogram = itertools.takewhile(lambda c: len(c) <= 4, nx_dendrogram)
for c in move_down_dendrogram:
    clustering_list = c
print(clustering_list)

membership = [0] * G.number_of_nodes()
for ic, cset in enumerate(clustering_list):
    for n in cset:
        membership[n] = ic
out = pd.Series(membership, index=nodes)
ComplexGates
  • 743
  • 8
  • 15
  • The above code works! Although, is there a method to identify the value for k, for selecting the most optimal number of communities which we want to create. My dataset contains about 1600 key words to be identified based on text matching within keywords. – Dgstah May 20 '19 at 09:20