0

So I'm writing an undergraduate paper using numerical linear algebra and applying it to a problem of my choice, and I chose the PageRank algorithm, via the Power Method.

I've wrote a python code that implements the Power Method to compute the page rank of a transition matrix that I specified. Which isn't very useful as I actually want to calculate the PageRank of a large Website and the matrices will be huge.

Is there a way to add a crawler or surfer that will generate the transition matrix for any website to my python code and then carry out the algorithm? I know its pretty simple in MATLAB but I don't want to pay for it just for this.

import numpy as np 

#original transition matrix a

a = np.matrix ([
[0,0,0,0,0,0],
[1.0,0,1.0/3.0,1.0/3.0,1.0/4.0,0],
[0,1.0/3.0,0,1.0/3.0,1.0/4.0,0],
[0,1.0/3.0,1.0/3.0,0,1.0/4.0,0],
[0,1.0/3.0,1.0/3.0,1.0/3.0,0,0],
[0,0,0,0,1.0/4.0,0]])
print 'a:', '\n',a
#now make the google matric with damping factor 0.85
b =(float(1)/float(6))*np.matrix([
[1,1,1,1,1,1],
[1,1,1,1,1,1],
[1,1,1,1,1,1],
[1,1,1,1,1,1],
[1,1,1,1,1,1],
[1,1,1,1,1,1]])
m = 0.15*a+0.85*b
print 'm:','\n',m
#now define the original normalized vector v
v = (float(1)/float(6))*np.matrix([
[1],
[1],
[1],
[1],
[1],
[1]])
print 'v:','\n',v
count = 0
#now define the pagerank function, and then apply m to the vector until it 
#converges. The converge difference is set to 0.001
def pagerank(v):
global count 
if sum(abs(m*v-v))>0.001 :
    count+=1
    print 'count',count 
    print m*v
    print 'sum(abs(m*v-v))', sum(abs(m*v-v))
    return pagerank(m*v)
else:
    count+=1
    print 'count', count 
    print m*v
    print 'sum(abs(m*v-v))', sum(abs(m*v-v))
    return m*v
result = pagerank(v)
#now we print the result 
print 'result', '\n', sorted(result, reverse=True)    

This is my code, at the start I define the matrix (a) that would be the transition matrix of the webpage, I made this one up based on a simple system of 6 pages.

I'd like the code to generate this matrix for any website I wanted, by using the link.

Rooney
  • 3
  • 3
  • what exactly would be simple in MATLAB ? You don't have to pay for that kind of functions anyway, you have Octave that does basically the same and is Open Source and in Python you have numpy and scipy so what is precisely the problem ? You have to be more explicit and show us some code ... – Gerard Rozsavolgyi Dec 08 '15 at 22:29
  • Using a surfer in MATLAB is simple, and I'd get octave but I have a mac and i don't know how to install it. I want my code to use a crawler to generate a matrix of all the hyperlinks associated with a website, that I can then use my code on to calculate the PageRank of the website. In MATLAB I've heard theres a command you can use for this, but it's not so easy in python – Rooney Dec 09 '15 at 00:11

0 Answers0