NetworkX python : pagerank_numpy, pagerank fails but pagerank_scipy works

Question

I am running PageRank on a weighted DiGraph where nodes = 61634, edges = 28,378.

pagerank(G) throws me ZeroDivsionError
pagerank_numpy(G) throws me ValueError : array to big
pagerank_scipy(G) gives me the page ranks though

I can understand that pagerank_numpy error would be due to memory limitations but why does pagerank fail? I tried adding an infinitesimal values to edges with zero weights but the same issues persist. Some pointers would be nice.

Link to my GraphML file - https://mega.co.nz/#!xlYzEDAI!Lyh5pD-NJL61JPfkrNyJrEm0NnFc586A0MUD8OMYAO0

NetworkX version - 1.8.1 Python - 2.7

score 4 · Answer 1 · answered Nov 12 '13 at 23:58

@mtitan8's answer is good but there is a little more to the story.

Since the time of your original question the NetworkX code was modified so that pagerank(), pagerank_numpy(), and pagerank_scipy() all give the same answer when there are zero or negative weights (https://github.com/networkx/networkx/pull/1001)

The results produced by those functions when you have negative weights are probably not what you want (if it works at all). The way the algorithms now handle creating the 'Google Matrix' out of the input matrix (the weighted adjacency matrix of the graph) is that the rows are divided by the row sum unless it is zero (then the entire row is set to zero). That sum might be negative.

If the resulting matrix still ends up with negative entries then the Perron-Frobenius theorem doesn't apply http://en.wikipedia.org/wiki/Perron%E2%80%93Frobenius_theorem and you are not guaranteed to have a unique largest eigenvalue with positive valued eigenvector.

Thanks @Aric I just looked at the Wikipedia article and apparently there is an extension to the Perron-Frobenius Theorem for non negative matrices. I wonder if there's more to it than meets the eye. https://en.wikipedia.org/wiki/Perron%E2%80%93Frobenius_theorem#Non-negative_matrices — Dexter, Nov 13 '13 at 04:59
If you have negative entries in the "Google Matrix" the theorem doesn't apply though. — Aric, Nov 13 '13 at 12:52

mdml · Accepted Answer · 2013-11-12T20:20:08.253

pagerank fails because it performs its computation using stochastic_graph -- unlike pagerank_numpy or pagerank_scipy. From the docs, stochastic_graph requires:

A NetworkX graph, must have valid edge weights

This "valid edge weights" point (which is not explained at all, which I think is a mistake) is the source of your problem.

For a directed graph, stochastic_graph uses the out_degree of each node to normalize the edges. Again from the docs:

The [out] degree is the sum of the edge weights adjacent to the node.

So when you have edges with zero weight or negative weight, the normalization process breaks with a ZeroDivisionError. The reason that negative weights are an issue is that they can cancel out positive weights, and thus give a node degree zero. For example, in your graph, node '2123271' has two edges who's weights sum to 0:

>>> G.edges('2123271', data=True)
[('2123271', '1712899', {'weight': -1L}),
 ('2123271', '890839', {'weight': 1L})]

Replacing negative or zero edge weights in your graph with a small, positive edge weight made it so pagerank could run:

In [1]: import networkx as nx
In [2]: G = nx.read_graphml("your_graph.graphml")
In [3]: defaultEdgeWeight = 0.01
In [4]: for u, v, d in G.edges(data=True):
            if d['weight'] <= 0:
                G[u][v]['weight'] = defaultEdgeWeight
In [5]: P = nx.pagerank( G )

Of course, pagerank didn't converge after 102 iterations, but that's another issue.

Thank you for your response. Is using `pagerank_scipy` good enough or does it sound like Garbage-In, Garbage-Out? Specifically, Can I keep my graph with the negative weights and use `pagerank_scipy` to get meaningful results? — Dexter, Nov 12 '13 at 17:41
`pagerank_scipy` should be fine if you run it long enough. However, I'm not sure what you hope to learn using negative weights. Pagerank is fundamentally a random walk with restart on a graph, where weights on the edges are used to weigh the probability of visiting certain neighbors. Since probabilities are [0, 1], I'm not sure how you would interpret a negative weight. It should run, though. — mdml, Nov 12 '13 at 20:27
Would it make more sense if I add a small default weight like you said when the weights are negative? I am more worried about the interpretation of the results. PageRank is essentially just a means to the end. — Dexter, Nov 13 '13 at 04:59
Yes, I think you should transform your negative weight edges to make them positive (even something as simple as taking the absolute value), or just remove them. The best way to do this depends a lot on your data, however. — mdml, Nov 13 '13 at 14:44

NetworkX python : pagerank_numpy, pagerank fails but pagerank_scipy works

2 Answers2