In an IPython cluster how can I gracefully interrupt a worker

Question

I want to run some jobs in a cluster, but I want to be able to kill the job if it is taking too long. Can I do this gracefully from the client, and still have the worker available to do more jobs?

My scenario is that I want to investigate how different machine learning classifiers and hyperparameters affect the time to run .fit(). If the time takes too long, I just want to abandon the task and move on to the next one.

I can find the PIDs of the workers, and I can use kill() to send a signal from the client, but sending SIGINT, SIGHUP and SIGABRT all seem to ruthlessly kill the worker, not just interrupt it. I can't put any logic in the worker code because it's the atomic call to .fit() that I want to time and interrupt.

Are you on Windows? Sending SIGINT to the engine should trigger KeyboardInterrupt during execution on platforms other than Windows. It may, however, exit the engine if it is actually idle, rather than busy working on a task. — minrk, Oct 04 '16 at 14:00
It's on Linux. In the context of a (headless) worker process, I'm assuming SIGINT is not trapped (resulting in a KeyboardInterrupt) but instead results in termination. — Tony, Oct 05 '16 at 00:33
SIGINT is trapped in Python by default, and results in KeyboardInterrupt in user code if user code is running. It will stop the engine if no user code is running. [This notebook](http://nbviewer.jupyter.org/gist/minrk/74622276ea28fd1f7c08d851c91a1616) demonstrates interrupting an engine. If `.fit()` is interruptible, this will work. — minrk, Oct 05 '16 at 13:57
Thanks for the code. Inspired by it, and in a single iPython instance, I pressed Ctrl-C during a call to `.fit()`, and the whole iPython process died. Seems `.fit()` is not interruptible — Tony, Oct 06 '16 at 04:17
That was on Windows. On Linux (my target) there is no brutal death, but `.fit()` refuses to be interrupted and just keeps processing until it ends, and then a `KeyboardInterrupt` exception is raised. The end result is the same; `.fit()` is uninterruptible it seems — Tony, Oct 06 '16 at 04:25

In an IPython cluster how can I gracefully interrupt a worker

0 Answers0