Scrapy spider not consistently terminating with use of CloseSpider extension

Question

i tried using CLOSESPIDER_TIMEOUT extension in the settings to kill the spiders which are running beyond 3 hours.

CLOSESPIDER_TIMEOUT = 3 * 60 * 60

Although the spiders receive the close timeout request, It never actually stops the spider, which keeps on running.

Any ideas on what's wrong in this case?

score 2 · Accepted Answer · answered Apr 30 '20 at 07:16

2

If your spider get the close timeout request, the extension seems to be working. It doesn't look like anything is wrong, but you might have to wait a bit before the spider fully closes, as he will first finish the already scheduled requests before shutting down completely.

answered Apr 30 '20 at 07:16

Wim Hermans

2,098
1
9
16

That was my assumption as well, However, when the spider runs for more than a day with a timeout of 3 hours. I'm assuming something is wrong. Any ideas? – codehia Apr 30 '20 at 08:39
1

I confirm that after `CLOSESPIDER_TIMEOUT` - scrapy will stop to schedule new requests and it will continue execute already scheduled requests after `CLOSESPIDER_TIMEOUT`. Also you can check this [solution](https://stackoverflow.com/a/55877309/10884791) - calling `os._exit(0)` – Georgiy Apr 30 '20 at 10:47
Shouldn't take that long indeed. did you define any close_spider methods in the pipelines/extensions? or an image/file download pipeline that is blocking? you can also try to use telnet (https://docs.scrapy.org/en/latest/topics/telnetconsole.html) when it looks blocked to see whether you have any progress in the queues – Wim Hermans May 01 '20 at 05:37

Scrapy spider not consistently terminating with use of CloseSpider extension

1 Answers1