3

In scrapy, it involves a lot of Item's field names writing.

1. Item class (Items.py)

class HelloItem(scrapy.Item):
   Name = scrapy.Field()
   Address = scrapy.Field()
   ...

2. Spider class (spider.py)

class HelloSpider(scrapy.Spider):

    def parse(self, response):
       item = HelloItem()
       item["Name"] = ...
       item["Address'] = ...
       ...

3. settings.py

EXPORT_FIELDS = ["Name", "Address", ...]

I defined EXPORT_FIELDS setting in settings.py to be used for defining the fields ordering for custom CSV item pipelines. The CSV pipeline code is like this, except the self.exporter.fields_to_export is loaded by settings.getlist("EXPORT_FIELDS").


You can see there are three places I have to define the field names (Name, Address, etc). If one day I have to rename some field names, I have to change them in those three files.

So is there a way to unite the Item's field name definitions in just one file? (or two files is also alright, the lesser is better than nothing)

Community
  • 1
  • 1
null
  • 8,669
  • 16
  • 68
  • 98

1 Answers1

0

You could not use items at all, and yield dictionaries instead. That way, you would not need items.py at all.

However, as a project grows, defining an Item subclass is recommended, and the repetition you mention is a lesser evil.

Thanks to defining an Item you can get an error message when you try to scrape an item field with a typo in one of your spiders.

Item classes also allow you to work with item loaders.

Gallaecio
  • 3,620
  • 2
  • 25
  • 64