0

I am trying to run several Spiders on a Cron based task.

I want to use scrapyd arguments to use the same spider with different settings.

What I am trying to do is:

I try to set the RETRY_HTTP_CODES = [500, 503, 504, 400, 403, 404, 408] and a spidermon custom monitor SPIDERMON_SPIDER_CLOSE_MONITORS = ('crawler.monitors.SpiderCloseMonitorSuite',) within my scrapyd schedule curl.

Somehow it doesn't take my custom settings tho.

I played around with escaping and other bash stuff, but in the end it didn't work. I was thinking of that it is not even possible?

curl http://localhost:6800/schedule.json -d project=M0 -d spider=m_pp -d setting=LOG_LEVEL='DEBUG' -d setting=RETRY_HTTP_CODES=[500,503,504,400,403,408,] -d setting=SPIDERMON_SPIDER_CLOSE_MONITORS="('crawler.monitors.SpiderCloseMonitorSuite',)" -d _version="r857-M360-416-disable-c"

Any help or workaround are welcome.

gerrel93
  • 89
  • 6

1 Answers1

0

This method requires python ≥ 3.8

The approach I use is to do this:

class MyCrawlerSpider(CrawlSpider):
    name = "my_crawler"

    ##################################################################
    # INIT
    ##################################################################

    def __init__(self, *args, **kwargs):
        self.test = kwargs.get("test")
        
        super(MyCrawlerSpider, self).__init__(*args, **kwargs)

Then I run:

curl http://localhost:6800/schedule.json -d test=True -d project=mycrawler -d spider=my_crawler

Jack
  • 394
  • 1
  • 15