1

I am currently using Scrapyd to start a crawling spider and the DEPTH_LIMIT setting is set in the Scrapy App settings.

I was wondering how to pass the depth_limit as a parameter in Scrapyd, allowing me to set it "dynamically" as requested by the user for every different crawling.

I believe I can only act on the spiders and pipelines of Scrapy.

EDIT
Thanks to @John Smith response, I found out it's possible to pass settings to the schedule method of scrapyd

settings = {
    'unique_id': unique_id,  # unique ID for database instance
    'USER_AGENT': 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)',
    'DEPTH_LIMIT': 1
}

# Schedule a new crawling task from scrapyd
task_id = scrapyd.schedule('default', "spider-name", settings=settings, url=url, domain=domain)
Nicolò Gasparini
  • 2,228
  • 2
  • 24
  • 53

1 Answers1

2

Unless I am misunderstanding when and where exactly you want to specify the DEPTH_LIMIT, you can pass any scrapy setting overwrites via scrapyd API call as such:

curl http://localhost:6800/schedule.json -d project=myproject -d spider=somespider -d setting=DEPTH_LIMIT=5

http://scrapyd.readthedocs.io/en/stable/api.html#schedule-json

John Smith
  • 676
  • 1
  • 4
  • 7