0

I am scraping a single domain using Scrapy and Crawlera proxy and sometimes due to Crawlera issues (technical break) and I am getting 407 status code and can't scrape any site. Is it possible to stop request pipeline for 10 minutes and then restart the spider? To be clear I do not wan't to defer the request, but stop everything (maybe except Item processing) for 10 minutes until they resolve the problem. I am running 10 concurrent threads.

Bociek
  • 1,195
  • 2
  • 13
  • 28

1 Answers1

1

Yes you can, there are few ways of doing this but the most obvious would be is simply insert some blocking code:

# middlewares.py
class BlockMiddleware:

    def process_response(self, response, request):
        if response.status == 407:
            print('beep boop, taking a nap')
            time.sleep(60)

and activate it:

# settings.py
DOWNLOADER_MIDDLEWARES = {
    'myproject.middlewares.BlockMiddleware': 100,
{
Granitosaurus
  • 20,530
  • 5
  • 57
  • 82