Use Scrapyd with arguments

Question

I'm using Scrapyd to use scrapy as webservice.

I would like to use the curl command with parameters like this :

curl http://myip:6800/schedule.json -d project=default -d spider=myspider -d domain=www.google.fr

But I don't know how to get the parameter domain in the Crawler.

import scrapy
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor

class MyItem(Item):
    url = Field()

class HttpbinSpider(CrawlSpider):

    name = "expired"
    start_urls = [domain]

I need to pass sometimes one domain or multiples in arguments.

Thank's !

score 3 · Accepted Answer · answered Mar 10 '16 at 15:47

It's not possible due to a missing feature in scrapy.

Users typically workaround this by serializing the arguments to curl and then unserializing in the spider's init().

curl http://myip:6800/schedule.json -d project=default -d spider=myspider -d domains='["www1.example.com", "www2.example.com"]'

Code :

class MySpider(Spider):
    def __init__(self, domains=None):
        domains = json.loads(domains)
        # do something with domains...

score 3 · Answer 2 · edited Mar 26 '18 at 17:27

3

class YourSpider(scrapy.Spider):
    def __int__(self, *args, **kwargs):
        super(YourSpider, self).__init__(*args, **kwargs)

curl http://localhost:6800/schedule.json -d project=myproject -d spider=somespider -d setting=DOWNLOAD_DELAY=2 -d arg1=val1

IN: print(spider.arg1) , OUT: val1

edited Mar 26 '18 at 17:27

Safiyya

1,383
1
10
16

answered Mar 26 '18 at 16:49

马瑞超

31
3

Use Scrapyd with arguments

2 Answers2