1

[Cross-posting from databrick's community : link]

I have been working on a POC exploring delta live table with GCS location.

I have some doubts :

how to access the gcs bucket. We have to establish connection using databricks service account. In a normal cluster creation , we go to cluster page and under Advanced Options we provide databricks service account email. For delta live table as the cluster creation is not under our control , how to add this email to cluster to make the gcs bucket path accessible .

I also tried to edit the delta live table cluster from UI by adding the service account sa under google service account block. Save Cluster failed with

**Error : Dlt prefixed spark images cannot be used outside of Delta live tables service** Error Log that I'm encountering when I provide a gs bucket path as a storage location for delta live table:

DataPlaneException: Failed to start the DLT service on cluster <cluster_id>. Please check the stack trace below or driver logs for more details.
 
com.databricks.pipelines.execution.service.EventLogInitializationException: Failed to initialize event log
 
java.io.IOException: Error accessing gs://<path>
 
shaded.databricks.com.google.api.client.googleapis.json.GoogleJsonResponseException: 403 Forbidden
 
GET https://storage.googleapis.com/storage/v1/b/<path>?fields=bucket,name,timeCreated,updated,generation,metageneration,size,contentType,contentEncoding,md5Hash,crc32c,metadata
 
{
 
  "code" : 403,
 
  "errors" : [ {
 
    "domain" : "global",
 
    "message" : "Caller does not have storage.objects.get access to the Google Cloud Storage object. Permission 'storage.objects.get' denied on resource (or it may not exist).",
 
    "reason" : "forbidden"
 
  } ],
 
  "message" : "Caller does not have storage.objects.get access to the Google Cloud Storage object. Permission 'storage.objects.get' denied on resource (or it may not exist)."
 
}
Alex Ott
  • 80,552
  • 8
  • 87
  • 132
shrutis23
  • 11
  • 1
  • it should be similar to this answer: https://stackoverflow.com/questions/73532480/delta-live-table-able-to-write-to-adls/73540298#73540298 – Alex Ott Nov 26 '22 at 13:10

2 Answers2

0

We are working to add more options for configuring storage permissions to the UI. In the mean time you can control all the properties about the cluster used in DLT by editing the JSON setting for your pipeline: https://docs.databricks.com/workflows/delta-live-tables/delta-live-tables-configuration.html#cluster-configuration

Michael Armbrust
  • 1,545
  • 11
  • 12
0

First mount the DBFS location to GCS cloud storage , please see the doc

Mount Storage Account

and to konw about how to use DLT to easily build and deploy SQL and Python pipelines and run ETL workloads directly on their lakehouse on Google Cloud. Please see the doc

Databricks Doc

Sandy
  • 223
  • 1
  • 2
  • 8