0

Suppose you execute a large script on Splash which needs to perform 5 minutes worth of operations. With the correct timeout parameter this is completely possible. Suppose that when splash starts processing the script you want to prematurely stop execution of the script, effectively killing it. Is there any way to tell Splash to do this? I'm not seeing anything documented in the API

LaserJesus
  • 8,230
  • 7
  • 47
  • 65

1 Answers1

1

I don't think there's such a feature currently being supported by Splash, nor scheduled to support in the near future.

However, if we don't limit the discussion to only methods that are "documented by Splash", we do have some (not too bad) approach:

When there's an on-going Splash request that you want to stop before Splash finishes executing it, you may simply terminate the corresponding TCP connection for that specific request. Once the TCP connection is dropped, Splash shall terminate the on-going script execution almost immediately.

starrify
  • 14,307
  • 5
  • 33
  • 50
  • Are you talking about the TCP connection between the Scrapy instance and Splash? Do you happen to know how to issue such a termination from a Scrapy based spider? I have access to the scrapy.http.Request object that initiated the call to Splash and I'm able to have thread access in a signal_shutdown handler for Ctrl+C. – LaserJesus Nov 05 '18 at 01:33
  • _Are you talking about the TCP connection between the Scrapy instance and Splash_: Yes. Earlier I wasn't too sure of the tool you're accessing Splash with (using Scrapy or something else). I don't think Scrapy has got something in-place for the very task of dropping the TCP connection, but you may probably look into the underlying Twisted connection objects, where I believe a solution exists. – starrify Nov 05 '18 at 11:09
  • @LaserJesus I was (partially) wrong in my previous comment. Scrapy is already handling such tasks via [the download timeout downloader middleware](https://github.com/scrapy/scrapy/blob/06f2db7fd16ff26534c5e6a7d7e2a41a27451ee0/scrapy/downloadermiddlewares/downloadtimeout.py#L10), or, to be precise, in [the downloader](https://github.com/scrapy/scrapy/blob/06f2db7fd16ff26534c5e6a7d7e2a41a27451ee0/scrapy/core/downloader/handlers/http11.py#L337). So you just need to locate the specific deferred object inside the downloader slot queues, and simulate the download timeout event. – starrify Nov 05 '18 at 12:16