In the end, I overwrite the retry middleware just for a small change. I set so every time the scraper gave up retrying on something, doesn't matter what is the status code, it will be marked as an error.
It seems Scrapy somehow doesn't associate giving up retrying as an error. That's weird for me.
This is the middleware if anyone wants to use it. Don't forget to activate it on the settings.py
from scrapy.downloadermiddlewares.retry import *
class Retry500Middleware(RetryMiddleware):
def _retry(self, request, reason, spider):
retries = request.meta.get('retry_times', 0) + 1
if retries <= self.max_retry_times:
logger.debug("Retrying %(request)s (failed %(retries)d times): %(reason)s",
{'request': request, 'retries': retries, 'reason': reason},
extra={'spider': spider})
retryreq = request.copy()
retryreq.meta['retry_times'] = retries
retryreq.dont_filter = True
retryreq.priority = request.priority + self.priority_adjust
return retryreq
else:
# This is the point where I update it. It used to be `logger.debug` instead of `logger.error`
logger.error("Gave up retrying %(request)s (failed %(retries)d times): %(reason)s",
{'request': request, 'retries': retries, 'reason': reason},
extra={'spider': spider})