I have a working scrapy spider deployed on an Amazon EC2 instance (c4xlarge) and running using scrapyd.
No matter what I do, I can't seem to top ~200 processed items per minute (according to scrapy logs).
I tried playing around with scrapyd concurrency settings, nothing helped, tried playing around with scrapyd max_proc_per_cpu
(lowered to 1 to avoid context switch), tried to run separate scrapy crawlers from command line, still, all of them together give the same results of an aggregate amount of around 200 items.
I can see from scrapy logs that the aggregate amount of web pages hit is increasing almost linearly but the scraped items per minute seems stuck at 200.
Any tips? Has anybody come across this before? Have I missed a setting somewhere?