Update spider code controlled by scrapyd

Question

What is the proper way to install/activate a spider that is controlled by scrapyd?

I install a new spider version using scrapyd-deploy; a job is currently running. Do I have to stop the job using cancel.json, then schedule a new job?

score 1 · Accepted Answer · answered Jan 11 '17 at 15:10

Answering my own question:

I wrote a little python script that stops all running spiders. After running this script, I run scrapyd-deploy, then relaunch my spiders. I am still not sure if this is the way scrapy pros would do it, though, but it looks sensible to me.

This is the script (replace the value for PROJECT to suit yours), it requires the requests package (pip install requests):

import requests
import sys
import time


PROJECT = 'crawler'  # replace with your project's name

resp = requests.get("http://localhost:6800/listjobs.json?project=%s" % PROJECT)
list_json = resp.json()
failed = False

count = len(list_json["running"])
if count == 0:
    print "No running spiders found."
    sys.exit(0)

for sp in list_json["running"]:
    # cancel this spider
    r = requests.post("http://localhost:6800/cancel.json", data={"project":PROJECT, "job": sp["id"]})
    print "Sent cancel request for %s %s" % (sp["spider"], sp["id"])
    print "Status: %s" % r.json()
    if r.json()["status"] != "ok":
        print "ERROR: Failed to stop spider %s" % sp["spider"]
        failed = True

if failed:
    sys.exit(1)

# poll running spiders and wait until all spiders are down
while count:
    time.sleep(2)
    resp = requests.get("http://localhost:6800/listjobs.json?project=%s" % PROJECT)
    count = len(resp.json()["running"])
    print "%d spiders still running" % count

Update spider code controlled by scrapyd

1 Answers1