2

I'm trying to configure spiders created with slyd to use scrapy-elasticsearch, so I'm sending -d parameter=value to configure it:

curl http://localhost:6800/schedule.json -d project=myproject  -d spider=myspider -d setting=CLOSESPIDER_ITEMCOUNT=100 -d settings=ITEM_PIPELINE=scrapyelasticsearch.ElasticSearchPipeline -d setting=ELASTICSEARCH_SERVER=localhost -d setting=ELASTICSEARCH_PORT=9200 -d setting=ELASTICSEARCH_INDEX=scrapy -d setting=ELASTICSEARCH_TYPE=items -d setting=ELASTICSEARCH_UNIQ_KEY=url

This is the default configuration that should go into settings.py:

ITEM_PIPELINES = [
  'scrapyelasticsearch.ElasticSearchPipeline',
]

ELASTICSEARCH_SERVER = 'localhost' # If not 'localhost' prepend 'http://'
ELASTICSEARCH_PORT = 9200 # If port 80 leave blank
ELASTICSEARCH_USERNAME = ''
ELASTICSEARCH_PASSWORD = ''
ELASTICSEARCH_INDEX = 'scrapy'
ELASTICSEARCH_TYPE = 'items'
ELASTICSEARCH_UNIQ_KEY = 'url'

I haven't found a way to configure slyd to generate these settings automatically with each new spider, so I'm trying to pass the options as -d arguments. scrapyd still runs the spiders but nothing gets saved to elasticsearch, the error log shows the following:

[Launcher,1900/stderr] /home/PORTIA/local/lib/python2.7/site-packages/scrapy/contrib/pipeline/__init__.py:21: 
ScrapyDeprecationWarning: ITEM_PIPELINES defined as a list or a set is deprecated, switch to a dict
      category=ScrapyDeprecationWarning, stacklevel=1)
localhost
  • 55
  • 1
  • 6

1 Answers1

3

In new version of scrapy you have to define pipelines as dictionary like this

    ITEM_PIPELINES = {'scrapyelasticsearch.ElasticSearchPipeline': 100}

You can do is making all spider in same project under spider folder, so that you can share your settings and pipelines

Tasawer Nawaz
  • 927
  • 8
  • 19
  • Is there a way to send that dict as a parameter to the scrapyd rest? If not how can I share the same settings with all spiders? I wanted a way to setup once and use in all projects. – localhost Sep 11 '14 at 14:52
  • While this doesn't answer sending the pipeline as a parameter, it does work, and I eventually just set slyd's project creation template to add it. – localhost Sep 11 '14 at 18:22