I am planning to use webcrawling in an application i am currently working on. I did some research on Nutch and run some preliminary test using it. But then i came across scrapy. But when i did some preliminary research and went through the documentation about scrapy i found that it can capture only structed data (You have to give the div name from which you want to capture data). The backend of the application i am developing is based on Python and i understand scrapy is Python based and some have suggested that scrapy is better than Nutch.
My requirement is to capture the data from more than a 1000 different webpages and run search for relevant keywords in that information.Is there any way scrapy can satisfy the same requirement.
1)If yes can you point out some example on how it can be done ?
2)Or Nutch+Solr is best suited for my requirement