9

I am using Azure Databricks to make a delta table in Azure Blob Storage using ADLS Gen2 but i am getting the error "Failure to initialize configurationInvalid configuration value detected for fs.azure.account.key" on last line

%scala
spark.conf.set(
    "fs.azure.account.oauth2.client.secret",
    "<storage-account-access-key>")
friends = spark.read.csv('myfile/fakefriends-header.csv',
   inferSchema = True, header = True)
friends.write.format("delta").mode('overwrite')\
   .save("abfss://tempfile@tempaccount.dfs.core.windows.net/myfile/friends_new")

Please help me out how can i avoid this error

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
Nabia Salman
  • 552
  • 1
  • 8
  • 29

3 Answers3

8

Short answer - you can't use storage account access key to access data using the abfss protocol. You need to provide more configuration options if you want to use abfss - it's all described in documentation.

spark.conf.set(
  "fs.azure.account.auth.type.<storage-account-name>.dfs.core.windows.net", 
  "OAuth")
spark.conf.set(
  "fs.azure.account.oauth.provider.type.<storage-account-name>.dfs.core.windows.net", 
  "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set(
  "fs.azure.account.oauth2.client.id.<storage-account-name>.dfs.core.windows.net", 
  "<application-id>")
spark.conf.set(
  "fs.azure.account.oauth2.client.secret.<storage-account-name>.dfs.core.windows.net", 
  dbutils.secrets.get(scope="<scope-name>",key="<service-credential-key-name>"))
spark.conf.set(
  "fs.azure.account.oauth2.client.endpoint.<storage-account-name>.dfs.core.windows.net", 
  "https://login.microsoftonline.com/<directory-id>/oauth2/token")

Storage access key could be used only when you're using wasbs, but it's not recommended to do with ADLSGen2.

P.S. You can also use passthrough cluster if you have permissions to access that storage account.

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
  • 4
    it was not clear to me from the linked official documentation, that `storage access key` could be used only when you're using `wasbs`; how did you derive the info? I know wasbs is not recommended to be used any more. – soMuchToLearnAndShare Jul 18 '22 at 10:43
  • 2
    i assume above "client.id." requires register an app in azure? – soMuchToLearnAndShare Jul 18 '22 at 10:45
  • 1
    and i found this section does not really work https://learn.microsoft.com/en-us/azure/databricks/data/data-sources/azure/azure-storage#--direct-access-using-abfs-uri-for-blob-storage-or-azure-data-lake-storage-gen2 – soMuchToLearnAndShare Jul 18 '22 at 10:59
  • it works just fine... using really regularly – Alex Ott Jul 18 '22 at 12:11
  • 1
    I meant section: `If you have properly configured credentials to access your Azure storage container, you can interact with resources in the storage account using URIs. Databricks recommends using the abfss driver for greater security.`. How the security should be configured? The documentation does not say in that section. I know other sections talks about SAS and Oauth, like your answer about Oauth with app registration. – soMuchToLearnAndShare Jul 20 '22 at 06:14
  • Security here means that ABFSS driver uses TLS 1.2 by default, plus it's relying on the short lived OAuth tokens compared to storage key or SAS – Alex Ott Jul 20 '22 at 07:11
  • 2
    I think the problem with the linked documentation is that it lacks example for each type of auth since there are combinations of methods and protocol that do not work but this is not explictly stated in the docs e.g. "Storage access key could be used only when you're using wasbs," is not in that page. – HansHarhoff Oct 16 '22 at 15:56
4

a few months later but try with the following code in your notebook

spark._jsc.hadoopConfiguration().set("fs.azure.account.key.<account name>.dfs.core.windows.net",'<account key>')
CMonte2
  • 43
  • 4
2

This error can also happen if storage account name is mistyped (my case). i.e. check one set in

spark.conf.set(s"fs.azure.account.oauth.provider.type.$<<storageAccountName>>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")

is the same as you use in the select * parquet.``abfs://...@<<storageAccountName>>... statement or other Spark action.

Khalid Mammadov
  • 511
  • 4
  • 6