2

I'am trying to setup connection between Databricks and Azure data lake storage gen2 using Unity Catalog External Locations feature.

Assumptions:

  1. Adls is behind private endpoint

  2. Databricks workspace is in private vnet, i've added Private and Public subnet of the workspace to ADLS account in "Firewalls and virtual networks" (service endpoint)

  3. I've grant the ACL's to the service principal on container lvl of the storage account.

After creating service principal with Storage Blob Data Contributor role (i've also tried Storage Blob Data Owner, Storage Account Contributor and Contributor roles) and creating storage credentials with External Location associated with it, i got an error:

Error in SQL statement: UnityCatalogServiceException: [RequestId=6f9a0a07-513c-45a5-b2aa-a67dd7d7e662 ErrorClass=INVALID_STATE] Failed to access cloud storage: AbfsRestOperationException

on the other hand:

After creating mount connection using the same service prinicpal i am able to connect the storage and write/read data to it.

Do you have any ideas?

When i try connect to the Adls using Managed Identity with the "Access Connector" the problem is gone, but it is now in public preview:

https://learn.microsoft.com/en-us/azure/databricks/data-governance/unity-catalog/azure-managed-identities

repcak
  • 113
  • 8
  • "You can use either an Azure managed identity or a service principal as the identity that authorizes access to your storage container. Managed identities are strongly recommended. They have the benefit of allowing Unity Catalog to access storage accounts protected by network rules, which isn’t possible using service principals, and they remove the need to manage and rotate secrets." I found in the documentation https://learn.microsoft.com/en-us/azure/databricks/data-governance/unity-catalog/manage-external-locations-and-credentials. So we can use only MI which is in public preview. – repcak Jan 09 '23 at 15:13

2 Answers2

2

I have the same issue. I did notice then when the storage account network firewall is disabled on the datalake it works using the service principle as the storage credential. I tried to add the public IP addresses from databricks found here but that did fail as well. Not sure how (from what IP address) to discover how Unity Catalog connects to the storage account. I have raised a support ticket with Microsoft and Databricks, will update once i hear more.

Frits
  • 21
  • 1
  • 1
    Both MS and Databricks have come back that private access to the storage account using a service principle is not supported. You will need to create a managed identity. Feature request? – Frits Jan 15 '23 at 21:54
  • Thanks, I think Unity Catalog feature should not be announced as GA, External location feature is critical to use, and the only way to connect to private (which is required in every company) ADLS is to use Managed Identity, which is in public preview, so - is it GA? Did they share information about GA of the Managed Identity in Unity Catalog? – repcak Jan 16 '23 at 15:17
0

I fixed the issue by creating two Databricks connectors, one for accessing the metastore storage account and the other for accessing the data lake store account. enter image description here