I'm interested in extracting semantic data (simple template stuff) from webpages and other sources that aren't currently semanticly aware. I've written crawlers and manual parser before in a bunch of different languages, but there always seems to be a lot of boilerplate and page specific code, and was wondering if you guys knew of any platforms or frameworks that simplified the process (open source only please).
I'll be writing one if I can't find one, so links to similar systems or framework suggestions would also be appreciated.