0

Wondering if anyone knows if scrapy-crawlera middleware handles the 429 status code when using scrapy, or if I need to implement my own retry logic?

I can't seem to find it documented anywhere

Kevin Glasson
  • 408
  • 2
  • 13

2 Answers2

1

To answer your question: NO, The scrapy-crawlera Middleware doesn't handle 429 status, it actually doesn't "handle" any status, it just handles communication between Crawlera and Scrapy.

Now, about Crawlera, they do handle status 429 by default, meaning that when they get a 429 response status, they will mark it as a ban and retry the same request.

If Crawlera didn't succeed after several retries, it will return a 503 status to the client (Scrapy in this case).

eLRuLL
  • 18,488
  • 9
  • 73
  • 99
0

You can extend list of retry codes with:

from scrapy.settings.default_settings import RETRY_HTTP_CODES

(check documentation here: https://doc.scrapy.org/en/latest/topics/downloader-middleware.html#std:setting-RETRY_HTTP_CODES)

And then in you spider:

custom_settings = {
    'RETRY_HTTP_CODES': RETRY_HTTP_CODES + [429],
}
vezunchik
  • 3,669
  • 3
  • 16
  • 25