0

Put simply, I'm scraping web data in Scrapy.

I need to analyse the scraped data for keywords / regex and if matched, pipeline the data to database. If not found, drop.

My question is: should/can I do this from within Scrapy and if so do you have any high level suggestions for me to research further.

Or should I simply perform this task outside of Scrapy.

Ideally I'd like to do it all from within Scrapy.

ps I'm new to Scrapy / Python and Stackoverflow and have researched this as far as possible and found no definitive answers/guidance.

Stuart
  • 11
  • This looks like a good fit for a custom Item Pipeline: https://doc.scrapy.org/en/latest/topics/item-pipeline.html. Item pipelines can be used to do post-processing tasks such as the one you mentioned. – Valdir Stumm Junior Jan 19 '17 at 11:36
  • I thought so, thank you. I'll point my research that way. – Stuart Jan 19 '17 at 14:38

0 Answers0