1

I want to get the number of walks between vertices v1 and v2 (paths with multiple visits of the same vertices allowed) in a graph.

There is a very neat algorithm outlines in Mark Newman's book - Networks: An Introduction (see this related math.SE question). Namely the number of walks of length n between v1 and v2, namely w(v1,v2)=A**n [v1,v2]. (take the v1/v2 element of the n-th power of the adjacency matrix).

Now my problem is that my graph is enormous and i can only store it as sparse matrix. For that reason, i cannot compute the power of A.

I tried to use networkx, because I can create graphs from sparse matrices. But first, I only found how to compute simple paths (walks without revisiting vertices), second - it is very slow.

import time
import numpy as np
from numpy.linalg import norm
import networkx as nx


# Max Length of 4, paths/walks between vertex 0 and 1.
len_of_paths=4
pos1=0
pos2=1


# Create random adjacency matrix and graph
np.random.seed(6)
rnd_mat=np.random.rand(250,250)
np.fill_diagonal(rnd_mat,0.1)

A = np.floor(rnd_mat+rnd_mat.transpose())
G = nx.from_numpy_matrix(A)


# Compute all simple paths using networkx
time_start=time.time()
paths_between=list(nx.all_simple_paths(G, source=pos1, target=pos2, cutoff=len_of_paths))
time_end=time.time()

print('time NetworkX: ',(time_end-time_start))



# Compute all walks using matrix multiplications
# Not possible with sparse matrices?
time_start=time.time()
Amult=A
accumulated_paths=Amult[pos1,pos2];
for ii in range(len_of_paths-1):
    Amult=np.dot(Amult,A)
    individual_paths=Amult[pos1,pos2]
    accumulated_paths+=individual_paths
    

time_end=time.time()
print('time MatMulti: ',(time_end-time_start))

print('Simple paths via networkx: ', len(paths_between))
print('Walks via matrix mult: ', accumulated_paths)

time NetworkX: 10.66906476020813

time MatMulti: 0.003000497817993164

So NetworkX is significantly slower.

Question: Is there any way to perform to speed the process up to estimate the number of simple paths or walks between two vertices in a Graph?

Mario Krenn
  • 223
  • 2
  • 13
  • Two comments: 1) Use the `numpy.linalg.matrix_power`. Depending on the size of the graph and the maximum allowed path length, you will get speedups on the order of 2-3. – Paul Brodersen Jun 22 '21 at 10:19
  • 2) You are testing your code on a very small graph. If you are only interested in the number of paths between a subset of nodes, the networkx implementation will scale substantially better to larger graphs, whereas the matrix multiplication will simply run out of RAM and choke. – Paul Brodersen Jun 22 '21 at 10:21
  • Thanks @PaulBrodersen. My example here is for small matrices, but I am only interested for huge graphs that i cannot store anymore in RAM. For those i use sparse matrices, thus my question is (reformulated from your comment): How can i perform `matrix_power` for sparse matrices. – Mario Krenn Jun 22 '21 at 12:27
  • The matrix power approach only makes sense if a) the graph is small, or b) you want to know the number of paths between **all** combinations of nodes. **Hence my question: do you really care about all combinations of nodes?** – Paul Brodersen Jun 22 '21 at 14:46
  • The networkx approach on the other hand will be very insensitive to network size. As long as you can load the network into memory, the running time should only depend on the average number of neighbours a node has and the size of the cutoff / maximum path length. – Paul Brodersen Jun 22 '21 at 14:48
  • @PaulBrodersen Thank you, i understand your concerns. I am not interested in all combinations of nodes. However, i am slightly worried about statements in the documentation of `networkx.algorithms.simple_paths.all_simple_paths`: "A single path can be found in `O(V+E)` time but the number of simple paths in a graph can be very large, e.g. `O(n!)` in the complete graph of order n.". Therefore, just matrix multiplication which scales with `O(n^(3*p))` (`p` being path length) seems much more efficient. – Mario Krenn Jun 22 '21 at 15:14
  • 1
    But you don't have a complete graph, you have a sparse graph, so the running time will be much, much closer to O(V + E) than O(V!). Give it a try before you knock it. – Paul Brodersen Jun 22 '21 at 15:34
  • 1
    However, if you do insist on using sparse matrix power calculations, [here](https://scicomp.stackexchange.com/questions/36922/accurate-way-to-calculate-matrix-powers-and-matrix-exponential-for-sparse-positi) is a discussion of your options. – Paul Brodersen Jun 22 '21 at 15:36

0 Answers0