I am working on data of Mathematics Genealogy Project. I collect all information about students and advisors and do some query processing on that data. To be precise, I crawl all the HTML pages from the root URL of Mathematics Genealogy Project http://www.genealogy.ams.org/ and collect all information that I need and query on that. For experimental purposes, I need some more data on net which is available in similar format. Can anybody suggest good websites which I can crawl for some interesting information. any data other than genealogy is also welcome but it should have at least some heirarchy. Thanks for all your suggestions.
Asked
Active
Viewed 404 times
4
-
Is there a reason that you can't write a generator to generate test data? I'm not quite sure I understand what you're trying to do... – FrustratedWithFormsDesigner Dec 03 '10 at 20:48
-
I have designed a framework which can crawl for all sites and get useful fields from HTML pages and query those. The framework is designed for Mathematics genelogy project like page. So I need a simialr one to test if framework does work on the other sites too. – Anu Dec 03 '10 at 20:59
1 Answers
1
There is a list of such sites at http://en.wikipedia.org/wiki/Academic_genealogy. For instance, http://academictree.org/.

mwhite
- 2,041
- 1
- 16
- 21