3

Scrapy contract fails if we are instantiating an Item or ItemLoader with the meta attribute or the Request() object passed from a previous parse method.

I was thinking of maybe overriding ScrapesContract to preprocess the request and load some dummy values in request.meta, not sure if that is good practice though.

I have seen the pre_process method in the docs (illustrated in the HasHeaderContract at the bottom) to get attributes from the request object, but I'm not sure if it can be used to set attributes.

EDIT: More details. Methods from an example crawler:

def parse_level_one(self, response):
   # populate loader
   return Request(url=url, callback=self.parse_level_two, meta={'loader': loader.load_item()})

def parse_level_two(self, response):
    """Parse product detail page

    @url http://example.com
    @scrapes some_field1 some_field2
    """
    loader = MyItemLoader(response.meta['loader'], response=response)

in the cli

$ scrapy check crawlername
Traceback... loader = MyItemLoader(response.meta['loader'], response=response)
KeyError: 'loader'

The idea that I am thinking about is this:

class LoadedScrapesContract(Contract):
    """ Contract to check presence of fields in scraped items
        @loadedscrapes page_name page_body
    """

    name = 'loadedscrapes'

    def pre_process(self, response):
        # MEDDLE WITH THE RESPONSE OBJECT HERE
        # TO ADD A META ATTRIBUTE TO RESPONSE,
        # LIKE AN EMPTY Item() or dict, JUST TO MAKE
        # THE ITEM LOADER INSTANTIATION PASS

    # this is same as ScrapesContract 
    def post_process(self, output):
        for x in output:
            if isinstance(x, BaseItem):
                for arg in self.args:
                    if not arg in x:
                        raise ContractFail("'%s' field is missing" % arg)
pad
  • 2,296
  • 2
  • 16
  • 23
  • Could you provide a reproduceable example demonstrating the problem? thanks. – alecxe Dec 08 '14 at 23:24
  • @alecxe I have added some details. For a reproducible example, add a contract to any parse method that instantiates an item or itemloader with the meta attribute of the Requests object. – pad Dec 09 '14 at 00:07
  • try passing the itemloader itself, instead of the item: `meta={'loader':itemloader}`? – bosnjak Dec 09 '14 at 07:43

1 Answers1

0

The best solution I've found for this, is to do the following rather than mucking up the contract

loader = MyItemLoader(response.meta.get('loader', MyItem()), response=response)

I prefer this method, but to stick the question, override adjust_request_args

pad
  • 2,296
  • 2
  • 16
  • 23