1

I have a graph, have computed the PageRank of its vectors, and would now like to compute clusters for the 20 nodes with highest PageRank. I am using graph-tool and networkx so far.

Is there a known and practical way to do this?

eyllanesc
  • 235,170
  • 19
  • 170
  • 241
  • 3
    Welcome to stackoverflow. I have a question for you - can you clarify exactly what you mean by "cluster" here? There are several definitions that you could be referring to. – Joel Jul 31 '19 at 01:06
  • I don't see any cluster definition that seems a fit here. Please be explicit about what you consider a "cluster", in particular what a good cluster would be. Do you want to assign each node to the nearest selected neighbor? That is more a classification than a clustering, because it does not attempt to discover *new* structure. – Has QUIT--Anony-Mousse Jul 31 '19 at 08:00
  • Hello guys, thanks for the replies! What I mean by cluster is more like a partition. I would like to assign each node (only the nodes in the top 20 pagerank vector) a partition, so that I can afterwards compare those nodes with the nodes on their computed partition. Is this clearer? Sorry, but I am a bit of a noob :D – HereComesWalrus Aug 01 '19 at 14:51

1 Answers1

0

Since your question is a bit vague, I'll try answer supposing that you are looking for a way to get the central cluster of your document collection. On this picture, the central 5 item cluster would be [B,C,E,F,D]

page rank

In pseudo code slightly pytonic, would that be something like this?

n = 0
center = node.with_highest_rank()
cluster = {center: {}}
current_connexion = center
while n<20:
    main_connexion = node.citing_node_with_higher_rank(current_connexion).filter(not in cluster.keys())
    cluster["center"] = {main_connexion: {}}
    n += 1
    # if ranks are higher on connexion level 2 than the next node on level 1, look down
    if node.citing_node_with_higher_rank(main_connexion).rank > node.citing_node_with_higher_rank(current_connexion).rank:
        current_connexion = main_connexion    

Advice: on stack overflow, the public is typically developers. Developers need concrete use case, concrete code and precise definition. If you have more general, theoretical / scientific question (typically, here, graph theory), have a look at other communities such as Computer Sciences

zar3bski
  • 2,773
  • 7
  • 25
  • 58
  • Hello David, thanks for the reply! I think I didnt explain myself well. What I am looking for is a way to find a partition for each of the nodes with the highest pagerank values. I am only looking to find 20 partitions, one for each of these nodes. Right now I only have the sorted pagerank vector and the graph itself, and was looking for ways to do this partitioning (maybe it's more commonly referred to as clustering of community finding) – HereComesWalrus Aug 01 '19 at 14:53