I am scraping a single domain using Scrapy and Crawlera proxy and sometimes due to Crawlera issues (technical break) and I am getting 407 status code and can't scrape any site. Is it possible to stop request pipeline for 10 minutes and then restart the spider? To be clear I do not wan't to defer the request, but stop everything (maybe except Item processing) for 10 minutes until they resolve the problem. I am running 10 concurrent threads.
Asked
Active
Viewed 235 times
1 Answers
1
Yes you can, there are few ways of doing this but the most obvious would be is simply insert some blocking code:
# middlewares.py
class BlockMiddleware:
def process_response(self, response, request):
if response.status == 407:
print('beep boop, taking a nap')
time.sleep(60)
and activate it:
# settings.py
DOWNLOADER_MIDDLEWARES = {
'myproject.middlewares.BlockMiddleware': 100,
{

Granitosaurus
- 20,530
- 5
- 57
- 82
-
thanks! Will sleep block all concurrent requests or just one? – Bociek Feb 16 '19 at 13:43
-
It will block the whole program :) – Granitosaurus Feb 16 '19 at 13:50