1

I am using Scrapy framework for crawling some websites. I want to stop crawling immediately after a flag I decide. In my pipeline I stop the crawler like this:

spider.crawler.engine.close_spider(self, reason='My reason')

It stops when I want but it doesn't stop executing the code until it sends requests on the urls remaining in the connectionpool and I don't want that. How can I stop it immediately, is there a way to clear the urls from the connectionpool?

Thank you in advance.

memeister
  • 53
  • 5

1 Answers1

1
  1. According to scrapy docs close_spider stop scheduling new requests and it does not stop crawling process immediately. In your case close_spider worked exactly as documented.

  2. The only way I know to stop crawling immediately is to use os.exit like on this answer.

Wai Ha Lee
  • 8,598
  • 83
  • 57
  • 92
Georgiy
  • 3,158
  • 1
  • 6
  • 18