1

We're migrating from blob storage to ADLS Gen 2 and we want to test the access to Data Lake from DataBricks. I created a service principal which has Blob Storage Reader and Blob Storage Contributor access to Data Lake.

My notebook sets the below spark config:

 spark.conf.set("fs.azure.account.auth.type","OAuth")
 spark.conf.set("fs.azure.account.oauth.provider.type","org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
 spark.conf.set("fs.azure.account.oauth2.client.id","<clientId")
 spark.conf.set("fs.azure.account.oauth2.client.secret","<secret>")
 spark.conf.set("fs.azure.account.oauth2.client.endpoint","https://login.microsoftonline.com/<endpoint>/oauth2/token")
//I replaced the values in my notebook with correct values from my service principal

When I run the below code, the content of the directory are shown correctly:

dbutils.fs.ls("abfss://ado-raw@<storage account name>.dfs.core.windows.net")

I can read a small text file from my data lake which is only 3 bytes but when I'm trying to show its content, the cell gets stuck at running command and nothing happens.

What do you think the issue is? and how do I resolve it?

Thanks in advance

Morez
  • 2,085
  • 2
  • 10
  • 33
  • I had the same issue and in my case it happened because I created a private endpoint to connect ADLS Gen 2 to Data Factory. There isn't a documentation explaining why this happens, I discovered by my own. But also checks if your Databricks cluster is setted into ADLS Gen 2 Firewall. – Kafels Aug 18 '21 at 11:34
  • Did you follow this step about [create a container](https://docs.databricks.com/data/data-sources/azure/adls-gen2/azure-datalake-gen2-get-started.html#create-a-container)? – Kafels Aug 18 '21 at 11:40
  • @Kafels the thing is the values are being read from datalake but they are not displayed. the access is not an issue – Morez Aug 18 '21 at 11:47
  • I don't think so, your stage job is frozen 0/1 – Kafels Aug 18 '21 at 11:50
  • @Kafels it's frozen for showing it but if you look at the image above it, you can see the data is fetched and stored in dftest – Morez Aug 18 '21 at 11:52
  • 1
    Wait your command throw an exception and update your question. It probably will run for 20 minutes before stopping. – Kafels Aug 18 '21 at 12:00
  • @Kafels I'll give it a try thanks – Morez Aug 18 '21 at 12:01
  • @Kafels it's still frozen with no errors and nothing running!!!! – Morez Aug 18 '21 at 13:22
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/236141/discussion-between-kafels-and-morez). – Kafels Aug 18 '21 at 13:41

1 Answers1

1

The issue was the private and public subnets had been deleted by mistake and then recreated using a different IP range. They need to be on the same range as the management subnet, otherwise the private endpoint set up for the storage account won’t work.

Morez
  • 2,085
  • 2
  • 10
  • 33