I'm interested to build a program to get all latest articles in a specific domain ("computer science") from a specific set of websites ("ScienceDirect" for example). As you know, some websites publish a page for each research article, such as: http://www.sciencedirect.com/science/article/pii/S108480451400085X Each page contains the information of a specific article.
I'm interested to know what is a best tool (open source) for this purpose? General web crawlers (such as Apache Nutch) provide a general framework to crawl the whole web but in my case I need a website(s)-specific crawler.