2

I have a complete weighted graph as you can see in the image below:

enter image description here

The Goal: My goal is to be able to choose the number of clusters and the number of vertices in each cluster using python's implementation of iGraph

What I've Tried So Far:

import igraph
import cairo
import numpy as np

# Import data (see below, I've included this file)
graph2 = igraph.Graph.Read_Ncol('10_graph.ncol')

# Assigns weights to weights1
weights1 = graph2.es["weight"]

# Converts it to undirected graph
graph2.to_undirected()

# 'graph2.to_undirected()' strips the graph of its weights
# so we restore them to the "weight" attribute after
graph2.es["weight"] = weights1

# Reduces the number of significant figures in each edge label
graph2.es["label"] = np.around(weights1, 2)

# Label all the vertices
graph2.vs["label"] = range(1, 11)

# Things I've tried: (uncomment only one at a time)
# Both return non-clustered graphs.
#community = graph2.community_spinglass(weights1)
community = graph2.community_leading_eigenvector(weights=graph2.es["weight"], clusters=3)
igraph.plot(community)

If the above code is run, you get as output the above image. You get the same image for both community-finding algorithms I've included. I've commented out one of them, so if you want to use the other one, go ahead and uncomment #community = graph2.community_spinglass(weights1).

The Problem(s):

  • It looks like none of the graphs are being clustered the way I want them to.
    • I pass weights=graph2.es["weight"], the list of weights corresponding the vertices in the graph.
    • I also explicitly pass clusters=3 to community_leading_eigenvector()
    • I am still not getting any clustering based on the edge weights of this graph.
    • How to draw proper clusters, either by color, or location, or however iGraph handles differentiation of clusters?
  • I am unable to find any official documentation about how to choose the number of vertices in each cluster.
    • Is there a way (even roundabout) to choose the number of vertices in each cluster? It doesn't have to be exact, but approximate.

10_graph.ncol

Here's the .ncol file I import to form the graph.

10_graph.ncol =

0 1 0.859412093436
0 2 0.696674188289
0 3 0.588339776278
0 4 0.5104097013
0 5 0.462457938906
0 6 0.427462387255
0 7 0.40350595007
0 8 0.382509071902
0 9 0.358689934558
1 2 0.912797848896
1 3 0.78532402562
1 4 0.681472223562
1 5 0.615574694967
1 6 0.567507619872
1 7 0.534715438785
1 8 0.506595029246
1 9 0.474297090248
2 3 0.941218154026
2 4 0.83850483835
2 5 0.759542327211
2 6 0.70025846718
2 7 0.659110815342
2 8 0.624313042633
2 9 0.584580479234
3 4 0.957468322138
3 5 0.886571688707
3 6 0.821838040975
3 7 0.772665012468
3 8 0.730820137423
3 9 0.684372167781
4 5 0.97372551117
4 6 0.92168855187
4 7 0.870589109091
4 8 0.823583870451
4 9 0.772154420843
5 6 0.98093419661
5 7 0.941236624882
5 8 0.895874086289
5 9 0.843755656833
6 7 0.985707938753
6 8 0.9523988462
6 9 0.906031710578
7 8 0.988193527182
7 9 0.955898136286
8 9 0.988293873257
HoldOffHunger
  • 18,769
  • 10
  • 104
  • 133
jackzellweger
  • 399
  • 1
  • 7
  • 20

1 Answers1

0

Both methods are just returning a single cluster. This tells me that there's no clear separation between your vertices: they're just a big tangle, so there's no reasonable way to pull them apart.

If I edit the edge weights to have clear separations, like in 10_g2.ncol below, then the clustering algorithms do divide the vertices.

At firs this did not produce the groups I expected. I put high weights within the vertex sets {0,1,2,3}, {4,5,6}, and {7,8,9}, and low weights between different sets. But spinglass splits it into {0,1,2,5,6}, {3,4}, and {7,8,9}, while leading_eigenvector splits it into {0,1,2,5,6} and {3,4,7,8,9}.

It turns out this is because to_undirected() changes the order of the edges, so when you reassign the edge weights after this operation, it associates them with different edges than before. To avoid this, you should instruct to_undirected to retain the edge attributes, e.g. by

graph2.to_undirected(combine_edges="max")

to retain the maximum value of each edge attribute (in case there are several directed edges between the same vertices), or

graph2.to_undirected(combine_edges="first")

to retain just the first value seen. (The method should be irrelevant in this case, since there are not multiple edges.)

Once you have actually split your graph into multiple clusters, the default plot method will differentiate them by colors. You can also use community.subgraph(i) to get the subgraph for the ith cluster and just draw that.

What about controlling the number of clusters? As you know, the leading_eigenvalue method has a clusters parameter for the desired number of clusters, but it's apparently more a guideline than an actual rule: giving clusters=3 results in just 1 cluster with your data, and 2 clusters with mine.

You can get more precise control of the number of clusters with a method which returns a VertexDendrogram instead of a Clustering, such as `community_edge_betweenness.

com3 = graph2.community_edge_betweenness(clusters=3, directed=False, weights="weight")

To get a clustering with n clusters, you call com3.as_clustering(n), which gave exactly n clusters for all my tests.

They're not necessarily good clusters:

In [21]: print(com3.as_clustering(3))
Clustering with 10 elements and 3 clusters
[0] 0
[1] 1, 2, 3, 4, 5, 7, 8, 9
[2] 6

In [22]: print(com3.as_clustering(4))
Clustering with 10 elements and 4 clusters
[0] 0
[1] 1, 2, 3, 4, 5, 8, 9
[2] 6
[3] 7

In [23]: print(com3.as_clustering(5))
Clustering with 10 elements and 5 clusters
[0] 0
[1] 1, 3, 5
[2] 2, 4, 8, 9
[3] 6
[4] 7

In [24]: print(com3.as_clustering(6))
Clustering with 10 elements and 6 clusters
[0] 0
[1] 1, 3, 5
[2] 2, 8, 9
[3] 4
[4] 6
[5] 7

Other methods returning VertexDendrograms are community_walktrap and community_fastgreedy. They both seem to perform better for this particular example, IMO.

In [25]: com5 = graph2.community_walktrap(weights='weight')

In [26]: com6 = graph2.community_fastgreedy(weights='weight')

In [27]: print(com5.as_clustering(3))
Clustering with 10 elements and 3 clusters
[0] 0, 1, 2, 5, 6
[1] 3, 4
[2] 7, 8, 9

In [32]: print(com6.as_clustering(3))
Clustering with 10 elements and 3 clusters
[0] 0, 1, 2, 5, 6
[1] 3, 4
[2] 7, 8, 9

Here is the more variegated weighting I used.

10_g2.ncol:

0 1 0.91
0 2 0.92
0 3 0.93
0 4 0.04
0 5 0.05
0 6 0.06
0 7 0.07
0 8 0.08
0 9 0.09
1 2 0.94
1 3 0.95
1 4 0.14
1 5 0.15
1 6 0.16
1 7 0.17
1 8 0.18
1 9 0.19
2 3 0.96
2 4 0.01
2 5 0.02
2 6 0.03
2 7 0.04
2 8 0.05
2 9 0.06
3 4 0.01
3 5 0.01
3 6 0.01
3 7 0.01
3 8 0.01
3 9 0.01
4 5 0.97
4 6 0.92
4 7 0.05
4 8 0.04
4 9 0.08
5 6 0.98
5 7 0.12
5 8 0.08
5 9 0.08
6 7 0.07
6 8 0.06
6 9 0.06
7 8 0.98
7 9 0.95
8 9 0.98
Nick Matteo
  • 4,453
  • 1
  • 24
  • 35
  • Do you think it would help the algorithm pick out clusters if I multiplied each weight by some coefficient, say 100? – jackzellweger Jun 17 '16 at 14:27
  • 1
    @jackskis: I actually tried that, and it didn't help. Some kind of exponential scaling might work. The dendrogram methods _will_ split your graph into the specified number of clusters, and some of them act reasonably with the original weights (others just separate one vertex at a time.) – Nick Matteo Jun 17 '16 at 14:31
  • Excellent. I will try that and report back. One more thing: given that this is a complete graph, do the clustering algorithms that use the dendrogram produce clusters such that the intra-cluster weights are maximal? If not, how are the clusters produced? After looking at the documentation, I can't find how the algorithm handles complete graphs with only edge weights to go off of. – jackzellweger Jun 17 '16 at 14:34
  • 1
    @jackskis: I don't know how the algorithms work. Their workings are not in the Python code, but in the igraph C library. I figured out that the edge weights were being jumbled by `to_undirected` (see the edited answer) and after I fixed that, the clustering reflects the weights as I'd expect. If you want, you can also delete edges with low weights (e.g. `graph2.delete_edges(weight_lt=0.5)` deletes all edges with weight less than 0.5) before clustering. But some algorithms (including `community_spinglass`) don't like it if the graph becomes disconnected when you do this. – Nick Matteo Jun 17 '16 at 16:35
  • There is a really important error in your solution: Your method of `graph2.to_undirected(combine_edges="max")` actually still scrambles the graph in what seems to be an arbitrary manner. Are there any alternatives? As is, your output graph is not picking out patterns in the original graph, but some other graph that is arbitrarily by the `to_undirected` function. – jackzellweger Jul 05 '16 at 15:20
  • @jackskis: Are you still doing someting like `graph2.es["weight"] = weights1`? That is what scrambles the weights. Using `combine_edges` leaves the edges with the same weights, using the given method to choose among various weights if there is more than one edge between the same vertices. Since in your case there is at most one edge between any given pair of vertices, the method doesn't matter. If you find that the weights are moved around, either you are doing that yourself somehow, or there is a bug in your copy of igraph. – Nick Matteo Jul 06 '16 at 14:49