I want my running spider to be closed immediately without processing any scheduled requests, I've tried the following approaches to no avail,
- Raising CloseSpider in callback functions
- Calling spider.crawler.engine.close_spider(spider, 'reason') in a downloader middleware
The script is automated as it runs several spiders in a loop. I want the running spider to be closed instantly when it meets a certain condition and the program to be continued with the rest of the spiders inside the loop.
Is there a way to drop the requests from the scheduler queue?
I have included a snippet where i'm trying to terminate the spider
class TooManyRequestsMiddleware:
def process_response(self, request, response, spider):
if response.status == 429:
spider.crawler.engine.close_spider(
spider,
f"Too many requests!, response status code: {response.status}
)
elif 'change_spotted' in list(spider.kwargs.keys()):
print("Attempting to close down the spider")
spider.crawler.engine.close_spider(spider, "Spider is terminated!")
return response