So I'm writing an undergraduate paper using numerical linear algebra and applying it to a problem of my choice, and I chose the PageRank algorithm, via the Power Method.
I've wrote a python code that implements the Power Method to compute the page rank of a transition matrix that I specified. Which isn't very useful as I actually want to calculate the PageRank of a large Website and the matrices will be huge.
Is there a way to add a crawler or surfer that will generate the transition matrix for any website to my python code and then carry out the algorithm? I know its pretty simple in MATLAB but I don't want to pay for it just for this.
import numpy as np
#original transition matrix a
a = np.matrix ([
[0,0,0,0,0,0],
[1.0,0,1.0/3.0,1.0/3.0,1.0/4.0,0],
[0,1.0/3.0,0,1.0/3.0,1.0/4.0,0],
[0,1.0/3.0,1.0/3.0,0,1.0/4.0,0],
[0,1.0/3.0,1.0/3.0,1.0/3.0,0,0],
[0,0,0,0,1.0/4.0,0]])
print 'a:', '\n',a
#now make the google matric with damping factor 0.85
b =(float(1)/float(6))*np.matrix([
[1,1,1,1,1,1],
[1,1,1,1,1,1],
[1,1,1,1,1,1],
[1,1,1,1,1,1],
[1,1,1,1,1,1],
[1,1,1,1,1,1]])
m = 0.15*a+0.85*b
print 'm:','\n',m
#now define the original normalized vector v
v = (float(1)/float(6))*np.matrix([
[1],
[1],
[1],
[1],
[1],
[1]])
print 'v:','\n',v
count = 0
#now define the pagerank function, and then apply m to the vector until it
#converges. The converge difference is set to 0.001
def pagerank(v):
global count
if sum(abs(m*v-v))>0.001 :
count+=1
print 'count',count
print m*v
print 'sum(abs(m*v-v))', sum(abs(m*v-v))
return pagerank(m*v)
else:
count+=1
print 'count', count
print m*v
print 'sum(abs(m*v-v))', sum(abs(m*v-v))
return m*v
result = pagerank(v)
#now we print the result
print 'result', '\n', sorted(result, reverse=True)
This is my code, at the start I define the matrix (a) that would be the transition matrix of the webpage, I made this one up based on a simple system of 6 pages.
I'd like the code to generate this matrix for any website I wanted, by using the link.