I have exteracted a set of urls which are on same topic. I want to find links between them so that i can form graph using python. the urls or websites would represent as nodes and links between them, represent as edges. please help me..
Asked
Active
Viewed 282 times
1 Answers
0
You can follow this simple approach -
Parse web pages using BeautifulSoup
[1] and keep anchor tags' href
property stored in a nested list(assume lst). So, if a web page(assume web1) links to 3 other web pages(assume with links href1, href2, href3), then -
lst['web1'][0] = 'href1'
lst['web1'][1] = 'href2'
lst['web1'][2] = 'href3'
Similarly parse other web pages and created lists for them. This web1 can be hrefx for webx. Hope you got the idea.

theharshest
- 7,767
- 11
- 41
- 51
-
Thanks for the answer theharshest. Now i have a web search result graph. now i want to do graph clustering on that. currently i m using networkx, but it doesnt provide any..can anyone please tell me a good graph clustering method which i can use keeping in mind that my graph is unweighted and undirected. Also i m facing problem with igraph package. what r the requirements with this? please help. – nishat Feb 28 '13 at 07:35