4

Looks like there is no legal way to pause/continue crawling with Scrapyd, as it was with Scrapy itself (scrapy crawl spider -s JOBDIR=jobdir/spider-1). The only resolution i've found was here:

https://groups.google.com/forum/#!msg/scrapyusers/2VjEBEbOo7s/MSH4GJc2B0sJ

but i haven't understand the idea.

So is it possible to pause/continue job with Scrapyd?

Thanks for your time

Pachvarsh
  • 137
  • 1
  • 10

1 Answers1

2

The idea is following:

When you run your spider from cmd you can paste additional an argument, for example: scrapy crawl myspider -s JOBDIR=crawls/somespider-1

When you use Scrapyd in endpoint schedule.json you can also paste some settings. If you are using scrapyd-client this will look like this:

s = {'JOBDIR': 'crawls/somespider-1'}
job = api.schedule(project_name, spider_name, settings=s)

After it, you can cancel a job. Next time, when you will run the same job - scrapyd resume previous work

amarynets
  • 1,765
  • 10
  • 27
  • Sorry, i don't understand where to write this options. If i run crawler with: (curl http://localhost:6800/schedule.json -d project=my-parsers -d spider=my_spider) where should i paste your code? – Pachvarsh Sep 26 '17 at 06:53
  • In this case, you can make a call so: `curl localhost:6800/schedule.json -d project=my-parsers -d spider=my_spider -d settings=JOBDIR=crawls/somespider-1` – amarynets Sep 27 '17 at 03:58
  • Unfortunately it doesn't work either. When i resume such job with: `curl localhost:6800/schedule.json -d project=my-parsers -d spider=my_spider -d settings=JOBDIR=crawls/somespider-1 jobid=3123asds5f34141asdfesw324` the new parse start from the beginning, only a log is the same – Pachvarsh Sep 27 '17 at 04:56