I am trying to run the scrapy crawler through scrapyd with JOBDIR
. I have a script in which I am sending the POST
request to scrapyd server:
scrapyd_script:
import requests
import json
import logging
from datetime import datetime
logging.basicConfig(
filename="scrapyd_script.log",
format="%(asctime)s %(message)s",
filemode="w",
)
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def start_job():
payload = {
"project": "default",
"spider": "houzz_crawler",
"setting": "JOBDIR=houzz_crawler",
}
response = requests.post("http://localhost:6800/schedule.json", data=payload)
return json.loads(response.text)
if __name__ == "__main__":
job_data = start_job()
logger.info(job_data)
And I created the systemd
service to run scrapyd_script.py
on reboot.
scrapyd_script.service:
[Unit]
Description=My Lovely Service
After=network.target
[Service]
Type=idle
Restart=on-failure
User=root
ExecStart=/bin/bash -c 'cd /home/..../houzz/ && source venv/bin/activate && python /home/..../houzz/houzz_crawler/scrapyd_script.py'
[Install]
WantedBy=multi-user.target
Service is getting started on reboot but the problem is every time system reboot crawler starts from start instead of resuming the crawler where it left off. How can I resume the crawler from it's previous state on system reboot?