1

I'm trying to schedule a crawler on EC2 and have the output export to a csv file cppages-nov.csv, while creating a jobdir encase I need to pause the crawl, but it is not creating any files. Am I using the correct feed exports?

curl http://awsserver:6800/schedule.json -d project=wallspider -d spider=cppages -d JOBDIR=/home/ubuntu/scrapy/sitemapcrawl/crawls/cppages-nov -d FEED_URI=/home/ubuntu/scrapy/sitemapcrawl/cppages-nov.csv -d FEED_FORMAT=csv
Jason Youk
  • 802
  • 2
  • 8
  • 20

2 Answers2

5

curl http://amazonaws.com:6800/schedule.json -d project=wallspider -d spider=cppages -d setting=FEED_URI=/home/ubuntu/scrapy/sitemapcrawl/results/cppages.csv -d setting=FEED_FORMAT=csv -d setting=JOBDIR=/home/ubuntu/scrapy/sitemapcrawl/crawl/cppages-nov

Jason Youk
  • 802
  • 2
  • 8
  • 20
2

use this feed in your settings file

FEED_EXPORTERS = {
'csv': 'scrapy.contrib.exporter.CsvItemExporter',
}
FEED_FORMAT = 'csv'
Omair Shamshir
  • 2,126
  • 13
  • 23