0

I am working on getting my PageRank values from igraph in R to match those I get from Gephi. I have followed this example: https://www.briggsby.com/personalized-pagerank and my igraph values match the weighted values this example has. But Gephi produces a different value for weighted PageRank and I'm unsure why. When I run this as an unweighted PageRank, I get the same results between igraph and Gephi.

The network I'm importing is simple to get the math correct -

Source Target Weight
A B 1.0
B C 1.0
C B 1.0
C A 0.5
A C 1.0
C D 0.1
D A 0.5

The code I'm using is as follows:

library(igraph);
library(plyr);
set.seed(123);
mydf <- data.frame(from=TestPageRank$Source, to=TestPageRank$Target);
mygraph <- graph.data.frame(mydf, directed = T);
c<-data.frame(users=V(mygraph)$name, page_rank = page_rank(mygraph, directed = T, damping = 0.85, weights = TestPageRank$Weight)$vector, degree=degree(mygraph));

The PageRanks I'm returning are as follows:

Node igraph Weighted PageRank Gephi Weighted PageRank
A 0.1960 0.2373
B 0.3373 0.2761
C 0.4075 0.3732
D 0.0591 0.1133

In this example, the ranking is at least the same, but when I apply this to my larger networks with thousands of nodes, the node ranking by PageRank is very different. Any thoughts on why this might be? Or how I can modify my R code to match the Gephi PageRank values?

Here's the updated code with import:

df <- structure(list(Source = c("A", "B", "C", "C", "A", "C", "D"), 
                     Target = c("B", "C", "B", "A", "C", "D", "A"), 
                     Weight = c(1,1, 1, 0.5, 1, 0.1, 0.5)), 
                class = "data.frame", row.names = c(NA, -7L))

g <- graph_from_data_frame(df)
page_rank(g, weights = E(g)$Weight, directed = T, damping = 0.85)$vector
degree(g)

And the output from the above:

         A          B          C          D 
0.19602465 0.33730560 0.40752024 0.05914951 
MAb2021
  • 127
  • 9

1 Answers1

0

I am not able to reproduce your results with igraph. Please provide a minimal reproducible example, with copyable code. You will find guidance here.

Here is your datafile as copyable CSV:

Source,Target,Weight
A,B,1.
B,C,1.
C,B,1.
C,A,0.5
A,C,1.
C,D,0.1
D,B,0.5

We get this after using read.csv:

df <- structure(list(Source = c("A", "B", "C", "C", "A", "C", "D"), 
    Target = c("B", "C", "B", "A", "C", "D", "B"), Weight = c(1, 
    1, 1, 0.5, 1, 0.1, 0.5)), class = "data.frame", row.names = c(NA, 
-7L))
g <- graph_from_data_frame(df)
page_rank(g, weights = E(g)$Weight)
$vector
         A          B          C          D 
0.14857410 0.37354978 0.41816130 0.05971482 

Using the ARPACK method, which is an entirely distinct algorithm, we get the same:

> page_rank(g, weights = E(g)$Weight, algo = 'arpack')
$vector
         A          B          C          D 
0.14857410 0.37354978 0.41816130 0.05971482 

These numbers differ from what you quote, but I cannot tell why without a reproducible example.

I should note that I worked on igraph's PageRank code and I believe that it is exceedingly unlikely that it would give incorrect results.

Szabolcs
  • 24,728
  • 9
  • 85
  • 174
  • My apologies, it's not reproducible because I had a typo in the data. The last row should be d-a not d-b. I've updated it and added the code. I believe igraph to be the correct version since it is matching to what I get with Python. I had been using Gephi to "validate" my findings but now I'm thinking the Gephi algorithm is incorrect. – MAb2021 Aug 05 '22 at 20:04
  • @MAb2021 Python does not have built in features to calculate PageRank. Which specific Python library are you talking about? – Szabolcs Aug 05 '22 at 20:28
  • I used networkx and followed the example in the link in my post. It matches what I get from igraph. But the two do not match Gephi. – MAb2021 Aug 05 '22 at 20:31
  • @MAb2021 I am not deeply familiar with Gephi, but typical reasons for differences in PageRank results are: 1. not using the same damping factor 2. not using the same directedness 3. different interpretations of how to proceed when the walker gets stuck — does not apply here as you have no sink nodes 4. different handling of multi-edges or self-loops — does not apply here as you don't have these – Szabolcs Aug 05 '22 at 20:36
  • these are all the reasons I went with this simplistic example dataset. My actual network has 2-3k nodes and many of the other factors you specified. But for validation, this set has no loops and no dangling nodes. Gephi allows the user to specify damping factor (which I set to 0.85) and directedness which is set to true. I have been using gephi as my “gold standard” until now but I think I will need to change my approach. – MAb2021 Aug 05 '22 at 20:42
  • @MAb2021 I just tried with Gephi, and I got the same results as with igraph. – Szabolcs Aug 06 '22 at 05:28