The following code runs in 45s when using pure Python.
for iteration in range(maxiter):
for node in range(n):
for dest in adjacency_list[node]:
rs[iteration + 1][dest] += beta * rs[iteration][node] / len(adjacency_list[node])
But, by simply initializing rs
as a numpy ndarray instead of a python list of lists, the code runs in 145s. I do not really know why numpy takes 3 times as much time with this array indexing.
My idea was to vectorize as much things as possible, but have only managed to vectorize the multiplication of beta/len(adjacency_list[node])
. This code runs in 77s.
beta_over_out_degree = np.array([beta / len(al) for al in adjacency_list])
for iteration in range(1, maxiter + 1):
r_next = np.full(shape=n, fill_value=(1 - beta) / n)
f = beta_over_out_degree * r
for i in range(n):
r_next[adjacency_list[i]] += f[i]
r = np.copy(r_next)
rs[iteration] = np.copy(r)
The problem is that adjacency_list
is a list of lists with differing column size, with 100 000 rows and 1-15 columns.
A more standard approach with an adjacency matrix, at least as a normal ndarray, is not an option, since for n=100 000 its shape of (n,n) is too big to be allocated to memory.
Is there any way to vectorize using its indexes for numpy advanced indexing(maybe turning it into a numpy ndarray)?
I would also greatly appreciate any other speed tips. Thanks in advance!
EDIT: Thanks to @stevemo I managed to create adjacency_matrix
with csr_matrix
functionality and used it for iterative multiplication. Program now runs in only 2s!
for iteration in range(1, 101):
rs[iteration] += rs[iteration - 1] * adjacency_matrix