Running more than one spiders one by one

Question

I am using Scrapy framework to make spiders crawl through some webpages. Basically, what I want is to scrape web pages and save them to database. I have one spider per webpage. But I am having trouble to run those spiders at once such that a spider starts to crawl exactly after another spiders finishes crawling. How can that be achieved? Is scrapyd the solution?

score 1 · Accepted Answer · answered Feb 11 '14 at 06:17

1

scrapyd is indeed a good way to go, max_proc or max_proc_per_cpu configuration can be used to restrict the number of parallel spdiers, you will then schedule spiders using scrapyd rest api like:

$ curl http://localhost:6800/schedule.json -d project=myproject -d spider=somespider

answered Feb 11 '14 at 06:17

Guy Gavriely

11,228
6
27
42

i have two spiders: spider1 and spider2. Now how to start doing it? – Nabin Feb 11 '14 at 06:45
But "scrapy deploy" doesn't work. Say "Usage ===== scrapy deploy [options] [ [target] | -l | -L ] deploy: error: Unknown target: default " – Nabin Feb 11 '14 at 07:54
And where is schedule.json file? Or do I have to create one? @Guy Gavriely – Nabin Feb 11 '14 at 08:55
A browse of the rest of the scrapyd documentation may prove useful: http://scrapyd.readthedocs.org/en/latest/ – Talvalin Feb 11 '14 at 12:14

Running more than one spiders one by one

1 Answers1

Linked