I'm new in scrapy and I need to pause a spider after receiving a response error (like 407, 429).
Also, I should do this without using time.sleep()
, and use middlewares or extensions.
Here is my middlewares:
from scrapy import signals
from pydispatch import dispatcher
class Handle429:
def __init__(self):
dispatcher.connect(self.item_scraped, signal=signals.item_scraped)
def item_scraped(self, item, spider, response):
if response.status == 429:
print("THIS IS 429 RESPONSE")
#
# here stop spider for 10 minutes and then continue
#
I read about self.crawler.engine.pause()
but how can I implement it in my middleware, and set a custom time for pause?
Or is there another way to do this? Thanks.