Wondering if anyone knows if scrapy-crawlera middleware handles the 429 status code when using scrapy, or if I need to implement my own retry logic?
I can't seem to find it documented anywhere
Wondering if anyone knows if scrapy-crawlera middleware handles the 429 status code when using scrapy, or if I need to implement my own retry logic?
I can't seem to find it documented anywhere
To answer your question: NO, The scrapy-crawlera Middleware doesn't handle 429
status, it actually doesn't "handle" any status, it just handles communication between Crawlera and Scrapy.
Now, about Crawlera
, they do handle status 429
by default, meaning that when they get a 429
response status, they will mark it as a ban and retry the same request.
If Crawlera didn't succeed after several retries, it will return a 503
status to the client (Scrapy in this case).
You can extend list of retry codes with:
from scrapy.settings.default_settings import RETRY_HTTP_CODES
(check documentation here: https://doc.scrapy.org/en/latest/topics/downloader-middleware.html#std:setting-RETRY_HTTP_CODES)
And then in you spider:
custom_settings = {
'RETRY_HTTP_CODES': RETRY_HTTP_CODES + [429],
}