I'm implementing some iterative algorithms for calculating the PageRank of a webgraph, and I'm having some trouble finding the best way to store in memory some matrixes.
I have a B
n x n
matrix, which rappresents the webgraph ( B[i,j]=1/outdegree[j]
if there's an arc from j
to i
, 0
otherwise; outdegree[j]
is the number of outgoing arcs from node j
) and which I'm storing as a scipy.sparse.dok_matrix
since of course it has mostly 0
entries. The problem is I have to compute many matrix x vector products of the type Px
, where
P = B + (1/n)*e*d^T
where e
is the all ones vector and d
is a boolean vector that has 1
in component j
if outdegree[j] > 0
. Basically e*d^T
is sort of a linear algebra "trick" to write a n x n
matrix with columns either all made of 1
s or 0
s, depending if the corresponding entry in d
is 1
or 0
.
So I'm struggling with two, not totally independent, things:
- How do I achieve the same "trick" in numpy, since
e*d.T
simply computes the scalar product, while I want a matrix. I guess it's some clever use of broadcasting and slicing, but I'm still new with numpy and can't figure it out - If I simply define
P
as above (supposing I've found a solution to 1.), I loose the memory advantage I gained storingB
as a sparse matrix, and suddenly I need to storen^2
floats. And anyways the matrix I'm adding toB
is very redundant (there are only two types of columns), so there must be a better way than storing the whole matrix in memory. Any suggestions? Keep in mind that it has to be in a way as to easily allow computation ofP.dot(x)
, forx
an arbitrary vector