I am using (well... trying to use) Azure Databricks and I have created a notebook.
I would like the notebook to connect my Azure Data Lake (Gen1) and transform the data. I followed the documentation and put the code in the first cell of my notebook:
spark.conf.set("dfs.adls.oauth2.access.token.provider.type", "ClientCredential")
spark.conf.set("dfs.adls.oauth2.client.id", "**using the application ID of the registered application**")
spark.conf.set("dfs.adls.oauth2.credential", "**using one of the registered application keys**")
spark.conf.set("dfs.adls.oauth2.refresh.url", "https://login.microsoftonline.com/**using my-tenant-id**/oauth2/token")
dbutils.fs.ls("adl://**using my data lake uri**.azuredatalakestore.net/tenantdata/events")
The execution fails with this error:
com.microsoft.azure.datalake.store.ADLException: Error enumerating directory /
Operation null failed with exception java.io.IOException : Server returned HTTP response code: 400 for URL: https://login.microsoftonline.com/using my-tenant-id/oauth2/token Last encountered exception thrown after 5 tries.
[java.io.IOException,java.io.IOException,java.io.IOException,java.io.IOException,java.io.IOException] [ServerRequestId:null] at com.microsoft.azure.datalake.store.ADLStoreClient.getExceptionFromResponse(ADLStoreClient.java:1169) at com.microsoft.azure.datalake.store.ADLStoreClient.enumerateDirectoryInternal(ADLStoreClient.java:558) at com.microsoft.azure.datalake.store.ADLStoreClient.enumerateDirectory(ADLStoreClient.java:534) at com.microsoft.azure.datalake.store.ADLStoreClient.enumerateDirectory(ADLStoreClient.java:398) at com.microsoft.azure.datalake.store.ADLStoreClient.enumerateDirectory(ADLStoreClient.java:384)
I have given the registered application the Reader
role to the Data Lake:
Question
How can I allow Spark to access the Data Lake?
Update
I have granted both the tenantdata
and events
folders Read
and Execute
access: