1

I am running over 40 spiders which are until now scheduled via cron and issued via scrapy crawl Due to several reasons I am now switching to scrapyd, one of them is to be able to see which jobs are running in case I need to do maintenance and reboot - so I can cancel a job.

Is it possible to cancel multiple jobs at once? I noticed that multiple jobs might be running at once with many waiting in quene with status "pending". Stopping the crawl might therefore require multiple calls of the cancel.json endpoint.

How to stop (or better pause) all jobs?

merlin
  • 2,717
  • 3
  • 29
  • 59

1 Answers1

1

The scrapyd API (as of v1.3.0) does not support pausing. It does have stopping one job per call, however, so you have to loop the jobs yourself

I took @kolas's script from this question and update it to work with python 3.

import json, os
PROJECT_NAME = "MY_PROJECT"

cd = os.system('curl http://localhost:6800/listjobs.json?project={} > kill_job.text'.format(PROJECT_NAME))
with open('kill_job.text', 'r') as f:
    a = json.loads(f.readlines()[0])

pending_jobs = list(a.values())[2]
for job in pending_jobs:
    job_id = job['id']
    kill = 'curl http://localhost:6800/cancel.json -d project={} -d job={}'.format(PROJECT_NAME, job_id)
    os.system(kill)
reading_ant
  • 344
  • 3
  • 12