scrapyd or CrawlerProcess for parralel parsing

Question

I need to run a lot of spiders (~20-50) in parallel on the same server. Some of my spiders are in progress more than two days and sometimes I need to run a new one before all processes are finished. As I understand, such possibility provides scrapyd (separate daemon process) and CrawlerProcess (class of scrapy). Or maybe celery is more suitable here? (I'd like to use python3) What are the special aspects of each approach and which one is better for my project?

I would recommend using [Scrapinghub's Dash](http://doc.scrapinghub.com/dash.html) it would be good to give it a try. — eLRuLL, May 13 '16 at 14:27

score 1 · Accepted Answer · answered Aug 06 '16 at 21:38

1

As mention in https://github.com/scrapy/scrapyd/issues/143, the scrpyd is going to support python3. Regardless of it, celery is a good choice.

answered Aug 06 '16 at 21:38

Mahdi Ne

266
3
9

scrapyd or CrawlerProcess for parralel parsing

1 Answers1