0

I need to run a lot of spiders (~20-50) in parallel on the same server. Some of my spiders are in progress more than two days and sometimes I need to run a new one before all processes are finished. As I understand, such possibility provides scrapyd (separate daemon process) and CrawlerProcess (class of scrapy). Or maybe celery is more suitable here? (I'd like to use python3) What are the special aspects of each approach and which one is better for my project?

Rainmaker
  • 10,294
  • 9
  • 54
  • 89

1 Answers1

1

As mention in https://github.com/scrapy/scrapyd/issues/143, the scrpyd is going to support python3. Regardless of it, celery is a good choice.

Mahdi Ne
  • 266
  • 3
  • 9