How to kill parallel execution of Databricks notebooks?

Question

I am currently using Python's Threading to parallelize the execution of multiple Databricks notebooks. These are long-running notebooks, and I need to add some logic for killing the threads in the case where I want to restart the execution with new changes. When re-executing the master notebook without killing the threads, the cluster is quickly filled with computational heavy, long-lived threads, leaving little space for the actually required computations.

I have tried these suggestions without luck. Furthermore, I have tried getting the runId from dbutils.notebook.run() and killing the thread with dbutils.notebook.exit(runId), but since the call to dbutils.notebook.run() is synchronous, I am unable to obtain the runId before the notebook has executed.

I would appreciate any suggestion on how to solve this issue!

score 1 · Answer 1 · answered Jul 15 '20 at 00:31

Hello @sondrelv and thank you for your question. I would like to clarify that dbutils.notebook.exit(value) is not used for killing other threads by runId. dbutils.notebook.exit(value) is used to cause the current (this) thread to exit and return a value. I see the difficulty of management without an available interrupt inside notebook code. Given this limitation, I have tried to look for other ways to cancel the threads.

One way is to use other utilities to kill the thread/run.

Part of the difficulty in solving this, is that threads/runs created through dbutils.notebook.run() are ephemeral runs. The Databricks CLI databricks runs get --run-id <ephemeral_run_id> can fetch details of an ephemeral run. If details can be fetched, then the cancel should also work (databricks runs cancel ...).

The remaining difficulty is getting the run ids. Ephemeral runs are excluded from the CLI runs list operation databricks runs list.
As you noted, the dbutils.notebook.run() is synchronous, and does not return a value to code until it finishes. However, in the notbook UI, the run ID and link is printed when it starts. There must be a way to capture these. I have not yet found how.

Another possible solution, would be to create some endpoint or resources for the child notebooks check whether they should continue execution or exit early using dbutils.notebooks.exit(). Doing this check between every few cells would be similar to the approaches in the article you linked, just applied to a notebook instead of a thread.

Thanks for the answer, and thank you for your suggestions and the clarification around usage of dbutils.notebook.exit(value). I'll do some more research on obtaining the list of runIds! — sondrelv, Aug 03 '20 at 06:40

How to kill parallel execution of Databricks notebooks?

1 Answers1