I'm using scrapy-splash to extract information from Javascript-driven IFRAMEd HTML pages. Sometimes, my splash Javascript function fails due to some browser condition and returns an error message like {"error": "NotSupportedError: DOM Exception 9"}
).
In my Item pipeline I drop these items in order to keep my results clean:
class NewspaperLayoutPipeline(object):
def process_item(self, item, spider):
if item.has_key('error'):
raise DropItem("Error capturing item %s" % item)
...
Unfortunately, my error item rate is about 40%. So I'd like to have scrapy-splash retry these failed urls instead of simply dropping the items. How can I do that?