Sparse matrixes for matrix-vector product in PageRank computation

Question

I'm implementing some iterative algorithms for calculating the PageRank of a webgraph, and I'm having some trouble finding the best way to store in memory some matrixes.

I have a B n x n matrix, which rappresents the webgraph ( B[i,j]=1/outdegree[j] if there's an arc from j to i, 0 otherwise; outdegree[j] is the number of outgoing arcs from node j) and which I'm storing as a scipy.sparse.dok_matrix since of course it has mostly 0 entries. The problem is I have to compute many matrix x vector products of the type Px, where

P = B + (1/n)*e*d^T

where e is the all ones vector and d is a boolean vector that has 1 in component j if outdegree[j] > 0. Basically e*d^T is sort of a linear algebra "trick" to write a n x n matrix with columns either all made of 1s or 0s, depending if the corresponding entry in d is 1 or 0.

So I'm struggling with two, not totally independent, things:

How do I achieve the same "trick" in numpy, since e*d.T simply computes the scalar product, while I want a matrix. I guess it's some clever use of broadcasting and slicing, but I'm still new with numpy and can't figure it out
If I simply define P as above (supposing I've found a solution to 1.), I loose the memory advantage I gained storing B as a sparse matrix, and suddenly I need to store n^2 floats. And anyways the matrix I'm adding to B is very redundant (there are only two types of columns), so there must be a better way than storing the whole matrix in memory. Any suggestions? Keep in mind that it has to be in a way as to easily allow computation of P.dot(x), for x an arbitrary vector

It would help if you could add some code that generates example input variables (B, d, e, etc) in the appropriate shape/type. — YXD, Dec 17 '13 at 11:15

alko · Accepted Answer · 2013-12-17T11:51:06.130

2

For simplicity, as expressions with np.dot will be bulky, let ∙ denote matrix multiplication, e, d and x be vectors, i.e have shape (n, 1), and in expression with square brackets * is a python's list multiplcation. Then, by associativity

(e∙d.T)∙x = e∙(d.T∙x) = [[d.T∙x] * n]

where d.T∙x is a scalar, and

P∙x = B∙x + 1/n * e∙d.T∙x = B∙x + 1/n * [[d.T∙x] * n]

so to be able to make your computations, you can store only vector d. Note that d.T∙x (or equivalently np.dot(d.T, x) if arrays used) is vectors product and is a cheap operation relative to matrix multiplication.

edited Dec 17 '13 at 11:51

answered Dec 17 '13 at 11:25

alko

46,136
12
94
102

yeah, I had just figured out essentially the same thing, instead of writing `P.dot(x)` when I need it, I simply write `B.dot(x) + (d.dot(x)*e)/n`. This way I solve problem 2. and entirely avoid problem 1. – renato Dec 17 '13 at 13:03

score 1 · Answer 2 · edited May 23 '17 at 12:19

The answer to point 1. is:

numpy.outer

It creates a B (MxN) matrix from a v1 (M) array and a v2 (N) array such that B(i,j) = v1[i]*v2[j]

The answer to 2. is more complex.

If you don't need B again, you can simply define it as a numpy.empty((n,n)), fill it as in the beginning of the question and then B += (1/n)*np.outer(e, d)
if n is not too big, probably having a sparse or a standard matrix does not make much of a difference
If possible consider np.outer(e, d) as a sparse matrix and then try some of the suggestions from this post

Sparse matrixes for matrix-vector product in PageRank computation

2 Answers2