I am currently using Python's Threading
to parallelize the execution of multiple Databricks notebooks. These are long-running notebooks, and I need to add some logic for killing the threads in the case where I want to restart the execution with new changes. When re-executing the master notebook without killing the threads, the cluster is quickly filled with computational heavy, long-lived threads, leaving little space for the actually required computations.
I have tried these suggestions without luck. Furthermore, I have tried getting the runId from dbutils.notebook.run()
and killing the thread with dbutils.notebook.exit(runId)
, but since the call to dbutils.notebook.run()
is synchronous, I am unable to obtain the runId before the notebook has executed.
I would appreciate any suggestion on how to solve this issue!