I have some scheduled data pipelines that are orchestrated via Azure Data Factory, each with a Databricks activity that runs on a job cluster.
All my Databricks activities are stuck in retry loops and failing with the following error,
Databricks execution failed with error state: InternalError, error message: Unexpected failure while waiting for the cluster <cluster-id> to be ready.Cause Cluster <cluster-id> is unusable since the driver is unhealthy.
My Databricks cluster is not even starting up.
This issue is quite similar to what has been posted here,
AWS Databricks cluster start failure
However, there are a few differences,
- My pipelines are running on Azure: Azure Data Factory and Azure Databricks
- I can spin up my interactive clusters (in the same workspace) without any problem
- I have checked with my colleagues who are running similar pipelines on different subscriptions (in the same region), but they are not facing any issue
Any idea what is going on here? Is it just a service interruption of sorts or is there something I can do resolve this?