1

I have two different spiders running. I was looking to write 2 different csv files named after spider name. spider1.csv data from spider1 and spider2.csv for data from spider2

Here's my CsvPipeline class:

class CsvPipeline(object):
def __init__(self):
    self.file = open("ss.csv", 'wb')
    self.exporter = CsvItemExporter(self.file, unicode)
    self.exporter.start_exporting()

def close_spider(self, spider):
    self.exporter.finish_exporting()
    self.file.close()

def process_item(self, item, spider):
    self.exporter.export_item(item)
    del item['crawlid']
    del item['appid']
    return item
CodeNinja101
  • 1,091
  • 4
  • 11
  • 19

4 Answers4

4

There's an already inbuilt feed exporter. See scrapy docs

In short you only need to add these to your settings.py:

FEED_URI = 'somename.csv'
FEED_FORMAT = 'csv'

You can also set these settings per spider:

class MySpider(Spider):
    name = 'myspider'
    custom_settings = {'FEED_URI': 'myspider.csv'}
Granitosaurus
  • 20,530
  • 5
  • 57
  • 82
2

You can use named parameters in a FEED_URI setting, which are replaced by spider attributes:

FEED_URI = '%(name)s.csv'
paul trmbrth
  • 20,518
  • 4
  • 53
  • 66
1

I would implement the following methods: open_spider(self, spider):

This method is called when the spider is opened.

Parameters: spider (Spider object) – the spider which was opened

class CsvPipeline(object):
def __init__(self):
    self.files = {}

def open_spider(self, spider):
    self.file = open("%s.csv" % (spider.name), 'wb')
    self.exporter = CsvItemExporter(self.file, unicode)
    self.exporter.start_exporting()

def close_spider(self, spider):
    self.exporter.finish_exporting()
    self.file.close()

def process_item(self, item, spider):
    self.exporter.export_item(item)
    del item['crawlid']
    del item['appid']
    return item

for more: scrapy pipeline documentation

Ganesh Pandey
  • 5,216
  • 1
  • 33
  • 39
0

In recent scrapy versions, the FEED_URI and FEED_FORMAT settings have been deprecated in favor of the FEEDS setting.

So, what could be defined as (as mentioned in @paul trmbrth's answer):

FEED_URI = '%(name)s.csv'
FEED_FORMAT = 'csv'

Now should be defined as:

FEEDS = {
    '%(name)s.csv': {
        'format': 'csv',
    }
}

I think this is more elegant than overriding the feed uri for each spider or implementing a custom pipeline to do what the feed exporter already does.

(Hope this helps anyone that may find this question but is using a more recent scrapy version.)

Meiogordo
  • 125
  • 3
  • 12