4

I am working on data of Mathematics Genealogy Project. I collect all information about students and advisors and do some query processing on that data. To be precise, I crawl all the HTML pages from the root URL of Mathematics Genealogy Project http://www.genealogy.ams.org/ and collect all information that I need and query on that. For experimental purposes, I need some more data on net which is available in similar format. Can anybody suggest good websites which I can crawl for some interesting information. any data other than genealogy is also welcome but it should have at least some heirarchy. Thanks for all your suggestions.

Anu
  • 525
  • 1
  • 6
  • 18
  • Is there a reason that you can't write a generator to generate test data? I'm not quite sure I understand what you're trying to do... – FrustratedWithFormsDesigner Dec 03 '10 at 20:48
  • I have designed a framework which can crawl for all sites and get useful fields from HTML pages and query those. The framework is designed for Mathematics genelogy project like page. So I need a simialr one to test if framework does work on the other sites too. – Anu Dec 03 '10 at 20:59

1 Answers1

1

There is a list of such sites at http://en.wikipedia.org/wiki/Academic_genealogy. For instance, http://academictree.org/.

mwhite
  • 2,041
  • 1
  • 16
  • 21