0

I have noticed that my orchestrated pipelines (in ADF) sometimes fail due to this error:


ErrorCode=FailedToReadFromAzureDatabricksDeltaLake,Failed to read from azure databricks delta lake.
Error message : Failed to send request to Azure Databricks Cluster. 
Operation: GetContextStatus.
Error: UnexpectedHttpException: Got invalid response: 404..

This occurs sometimes, not commonly.
Looking to get some more information on how to resolve / prevent this from occurring.

I suspect it might be overload on the cluster and that increasing the specs might solve it. However, the error message is quite vague. To give some context, multiple pipelines use the same cluster for ingestion and ETL. Maybe its better to segregate the pipelines on multiple clusters to prevent overload / concurrency?

1 Answers1

0

In case if you are using Interactive cluster then multiple jobs running on it could lead Driver unresponsiveness, which in turn result in such issues, but the cluster restart might help the jobs to run without any issues.

  • It is always recommended to use Job Clusters instead of Interactive cluster when multiple jobs to share single cluster, especially in the production scenarios.

Reference: https://docs.databricks.com/jobs.html

To overcome this issue. You may also try below approaches which will require some cluster configuration changes:

  • In the Cluster Spark configuration, set the spark.cleaner.periodicGC.interval to about 10 or 30 mins depending on how frequently you would like the cleanup to occur. This context cleaner triggers cleanups only when weak references are garbage collected. In long-running applications with large driver JVMs, where there is little memory pressure on the driver, this may happen very occasionally or not at all. Not cleaning at all may lead to executors running out of disk space after a while.

  • You can set the cluster to auto-scale increasing the max number of nodes to a higher number or use more powerful VMs. That will allocate more memory for your job executions.

Kranthi Pakala
  • 1,288
  • 5
  • 10
  • Thanks for the response. I have looked into Job clusters, and it seems like the right approach. However, I have one ADF pipeline per table in my silver and gold layer for transformation logic, and so I fear using Job clusters will kickstart too many clusters at once. Wouldn't this be pricey? Or does Job clusters distribute the jobs evenly? If I for instance I have 40 pipelines each running daily at the same time with one Notebook call, would it start 40 job clusters then, each charging X DBU per hour? – Oscar Dyremyhr Sep 29 '22 at 09:24