0

I'm using scrapy-splash to extract information from Javascript-driven IFRAMEd HTML pages. Sometimes, my splash Javascript function fails due to some browser condition and returns an error message like {"error": "NotSupportedError: DOM Exception 9"}).

In my Item pipeline I drop these items in order to keep my results clean:

class NewspaperLayoutPipeline(object):
    def process_item(self, item, spider):
        if item.has_key('error'):
            raise DropItem("Error capturing item %s" % item)
            ...

Unfortunately, my error item rate is about 40%. So I'd like to have scrapy-splash retry these failed urls instead of simply dropping the items. How can I do that?

1 Answers1

0

You cannot retry an item in Pipeline.

You should write a check in your Spider and then yield Request(url, dont_filter=True) the same URL again

def parse(self, response):
    if item.has_key('error'):
        raise DropItem("Error capturing item %s" % item)
        yield Request(response.url, dont_filter=True)
Umair Ayub
  • 19,358
  • 14
  • 72
  • 146