What is the proper way to install/activate a spider that is controlled by scrapyd?
I install a new spider version using scrapyd-deploy; a job is currently running. Do I have to stop the job using cancel.json
, then schedule a new job?
Answering my own question:
I wrote a little python script that stops all running spiders. After running this script, I run scrapyd-deploy
, then relaunch my spiders.
I am still not sure if this is the way scrapy pros would do it, though, but it looks sensible to me.
This is the script (replace the value for PROJECT
to suit yours), it requires the requests
package (pip install requests
):
import requests
import sys
import time
PROJECT = 'crawler' # replace with your project's name
resp = requests.get("http://localhost:6800/listjobs.json?project=%s" % PROJECT)
list_json = resp.json()
failed = False
count = len(list_json["running"])
if count == 0:
print "No running spiders found."
sys.exit(0)
for sp in list_json["running"]:
# cancel this spider
r = requests.post("http://localhost:6800/cancel.json", data={"project":PROJECT, "job": sp["id"]})
print "Sent cancel request for %s %s" % (sp["spider"], sp["id"])
print "Status: %s" % r.json()
if r.json()["status"] != "ok":
print "ERROR: Failed to stop spider %s" % sp["spider"]
failed = True
if failed:
sys.exit(1)
# poll running spiders and wait until all spiders are down
while count:
time.sleep(2)
resp = requests.get("http://localhost:6800/listjobs.json?project=%s" % PROJECT)
count = len(resp.json()["running"])
print "%d spiders still running" % count