0

I have exteracted a set of urls which are on same topic. I want to find links between them so that i can form graph using python. the urls or websites would represent as nodes and links between them, represent as edges. please help me..

nishat
  • 1

1 Answers1

0

You can follow this simple approach -

Parse web pages using BeautifulSoup[1] and keep anchor tags' href property stored in a nested list(assume lst). So, if a web page(assume web1) links to 3 other web pages(assume with links href1, href2, href3), then -

lst['web1'][0] = 'href1'
lst['web1'][1] = 'href2'
lst['web1'][2] = 'href3'

Similarly parse other web pages and created lists for them. This web1 can be hrefx for webx. Hope you got the idea.

[1] http://www.crummy.com/software/BeautifulSoup/

theharshest
  • 7,767
  • 11
  • 41
  • 51
  • Thanks for the answer theharshest. Now i have a web search result graph. now i want to do graph clustering on that. currently i m using networkx, but it doesnt provide any..can anyone please tell me a good graph clustering method which i can use keeping in mind that my graph is unweighted and undirected. Also i m facing problem with igraph package. what r the requirements with this? please help. – nishat Feb 28 '13 at 07:35