3

We are attempting to port a python application to .Net/Windows. The original application uses a NetworkX implementation of pagerank.

When we run the original code for the dataset below we get one set of results, when we run what we believe is the same dataset using iGraph pagerank, we get a different result set.

Can anyone review the data below and give us some idea as to what might be causing the disconnect?

Starting Graph

From , To, Weight
------------------------------------------
[1, 2, 1.237635735532509]
[1, 3, 1.3176784432060453]
[2, 5, 0.1]
[2, 7, 1.6545276334003642]
[3, 0, 0.4013877113318902]
[3, 5, 0.9056698458264134]
[3, 7, 3.4462871026284194]
[4, 5, 0.9693717489378296]
[4, 7, 1.3176784432060453]
[5, 7, 1.6053605156578263]
[7, 2, 0.8068528194400547]
[7, 3, 0.9771288098085582]
[7, 4, 4.317678443206045]
[7, 5, 2.0108256237659905]

Results running Pagerank using NetworkX

0: 0.030658861877660655
1: 0.025151437717922904
2: 0.06899335192504014
3: 0.0767301059609998
4: 0.20435115331218195
5: 0.19799952556413375
7: 0.39611556364206074

Running Pagerank using iGraph

h = Graph()
h.add_vertices([0,1,2,3,4,5,6,7])
h.add_edge(1, 2, weight = 1.237635735532509)
h.add_edge(1, 3, weight = 1.3176784432060453)
h.add_edge(2, 5, weight = 0.1)
h.add_edge(2, 7, weight = 1.6545276334003642)
h.add_edge(3, 0, weight = 0.4013877113318902)
h.add_edge(3, 5, weight = 0.9056698458264134)
h.add_edge(3, 7, weight = 3.4462871026284194)
h.add_edge(4, 5, weight = 0.9693717489378296)
h.add_edge(4, 7, weight = 1.3176784432060453)
h.add_edge(5, 7, weight = 1.6053605156578263)
h.add_edge(7, 2, weight = 0.8068528194400547)
h.add_edge(7, 3, weight = 0.9771288098085582)
h.add_edge(7, 4, weight = 4.317678443206045)                   
h.add_edge(7, 5, weight = 2.0108256237659905)

z = h.pagerank()

returns....

0.08263947646845539 
0.11209944263156851 
0.13863513488523824 
0.2088786898834253 
0.0909928717668216 
0.15533634946784883 
0.04713009918827309 
0.16428793570836897

pagerank(None,True,.85,'weight',None,'prpack',1000,.001) returns,

0.06306529189761995
0.1213272777521786
0.12419698504275958
0.21601479253860403
0.0845752983652644
0.10892203451714054
0.05260867410276095
0.22928964578367186

pagerank(None,True,.85,'weight',None,'power',1000,.001) returns,

0.05046861007484653
0.08032641955693953
0.1387381559084609
0.18249744338552665
0.10389267832310527
0.16623355776440546
0.019058750577540366
0.2587843844091753

Any guidance you can provide would be greatly appreciated.

SparkAndShine
  • 17,001
  • 22
  • 90
  • 134
Nick McKeel
  • 271
  • 2
  • 5

1 Answers1

2

There a couple of things leading to the differences in page rank. First, the networkx graph does not have node 6 (the isolate), but the igraph graph does. Second, make sure that the igraph graph is directed. When you do this, the page rank scores are nearly identical (at least out to the first 6th decimal place or so).

import igraph as ig
import networkx as nx
G=nx.DiGraph()
G.add_nodes_from([0,1,2,3,4,5,6,7]) #Add node 6
G.add_edge(1, 2,weight= 1.237635735532509)
G.add_edge(1, 3,weight= 1.3176784432060453)
G.add_edge(2, 5,weight= 0.1)
G.add_edge(2, 7,weight= 1.6545276334003642)
G.add_edge(3, 0,weight= 0.4013877113318902)
G.add_edge(3, 5,weight= 0.9056698458264134)
G.add_edge(3, 7,weight= 3.4462871026284194)
G.add_edge(4, 5,weight= 0.9693717489378296)
G.add_edge(4, 7,weight= 1.3176784432060453)
G.add_edge(5, 7,weight= 1.6053605156578263)
G.add_edge(7, 2,weight= 0.8068528194400547)
G.add_edge(7, 3,weight= 0.9771288098085582)
G.add_edge(7, 4,weight= 4.317678443206045)
G.add_edge(7, 5,weight= 2.0108256237659905)

h = ig.Graph(directed = True)  #Ensure the graph is directed
h.add_vertices([0,1,2,3,4,5,6,7])
h.add_edge(1, 2, weight = 1.237635735532509)
h.add_edge(1, 3, weight = 1.3176784432060453)
h.add_edge(2, 5, weight = 0.1)
h.add_edge(2, 7, weight = 1.6545276334003642)
h.add_edge(3, 0, weight = 0.4013877113318902)
h.add_edge(3, 5, weight = 0.9056698458264134)
h.add_edge(3, 7, weight = 3.4462871026284194)
h.add_edge(4, 5, weight = 0.9693717489378296)
h.add_edge(4, 7, weight = 1.3176784432060453)
h.add_edge(5, 7, weight = 1.6053605156578263)
h.add_edge(7, 2, weight = 0.8068528194400547)
h.add_edge(7, 3, weight = 0.9771288098085582)
h.add_edge(7, 4, weight = 4.317678443206045)                   
h.add_edge(7, 5, weight = 2.0108256237659905)

Now, checking page rank:

>>> h.pagerank(None,True,.85,'weight',None,'arpack')
[0.02990667328959136,
 0.02453435976169968,
 0.06730062757414129,
 0.07484756185358077,
 0.199337429914656,
 0.19314195829041825,
 0.02453435976169968,
 0.38639702955421296]
>>> nx.pagerank(G,alpha=0.85,weight = 'weight')
{0: 0.029906698992551148,
 1: 0.02453435614919296,
 2: 0.06730055444151634,
 3: 0.07484747242070261,
 4: 0.19933699472630276,
 5: 0.19314246522466136,
 6: 0.02453435614919296, #Here is node 6, missing from your example
 7: 0.3863971018958797}

One mystery to me is that the documentation for networkx says it uses the power method. However, using the power method for igraph produces different results. Using arpack or prpack produces substantially similar results.

paqmo
  • 3,649
  • 1
  • 11
  • 21