I'm new to Python, and i'm trying to calculate Page Rank vector according to this equation in Python:
Where Pi(k) is Page-rank vector after k-Th iteration, G is the Google matrix, H is Hyperlink matrix, a is a dangling node vector, alpha = 0.85 and e is vector of ones.
The calculation with G takes a lot of time, while using the Hyperlink matrix H, which is sparse matrix, should take significantly less time.
Here's my code:
for i in range(1, k_steps+1):
for j in range(0, len(dictionary_urls)):
for k in range(0, len(dictionary_urls)):
if matrix_H[k][j] != 0:
matrix_pi_k[i][j] += matrix_pi_k[i-1][k] * float(matrix_H[k][j])
alpha_pi_k_a += matrix_pi_k[i-1][k]*float(vector_a[k])
alpha_pi_k_a = alpha_pi_k_a * float(alpha)
alpha_pi_k_a = alpha_pi_k_a + float((1- alpha))
alpha_pi_k_a = alpha_pi_k_a / float(len(dictionary_urls))
matrix_pi_k[i][j] = matrix_pi_k[i][j] * float(alpha)
matrix_pi_k[i][j] = matrix_pi_k[i][j] + float(alpha_pi_k_a)
alpha_pi_k_a = 0
k_steps is the number of iteration needed.
dictionary_links contains all the URLs.
After code execution, matrix_pi_k should have all the Pi vector's
I calculated all the variables that needed. I got a run time using H matrix is almost equal to run time using G matrix, although, in theory it should be different.
Why? And what should I change to reduce the run time?
Thank you.