1

I have 2 different storage accounts with same container name. Lets say tenant1 and tenant2 as storage account name with "appdata" as container name in both accounts. I can create and mount both containers to dbfs. But i am unable to read/write dynamically by passing storage account names to the mount point code. since dbfs has mnt/containername as mount point in dbfs, only latest or previously passed storage account's mount point is being referred in databricks. How to achieve my goal here?

Arjun R
  • 77
  • 1
  • 9

1 Answers1

1

Mount points should be static, so you just need to have two different mount points pointing to the correct container, something like this:

/mnt/storage1_appdata
/mnt/storage2_appdata

so if you want your code be dynamic, use the f"/mnt/{storage_name}_appdata".

It's not recommended to dynamically remount containers - you can get cryptic errors when you remount mount point while somebody is reading/writing data using it.

Also, you can access ADLS directly if you specify correct configuration for your cluster/job (see doc) - you can even access both containers at the same time, just need to setup configuration for both storage accounts:

spark.conf.set("fs.azure.account.auth.type.<storage-account-name>.dfs.core.windows.net", 
  "OAuth")
spark.conf.set(
  "fs.azure.account.oauth.provider.type.<storage-account-name>.dfs.core.windows.net", 
  "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set(
  "fs.azure.account.oauth2.client.id.<storage-account-name>.dfs.core.windows.net", 
  "<application-id>")
spark.conf.set(
  "fs.azure.account.oauth2.client.secret.<storage-account-name>.dfs.core.windows.net", 
  dbutils.secrets.get(scope="<scope-name>",key="<service-credential-key-name>"))
spark.conf.set(
  "fs.azure.account.oauth2.client.endpoint.<storage-account-name>.dfs.core.windows.net", 
  "https://login.microsoftonline.com/<directory-id>/oauth2/token")
Alex Ott
  • 80,552
  • 8
  • 87
  • 132
  • Got the point, So always /mnt/{container_name} should be unique when it comes to dbfs. Correct me if am wrong. – Arjun R Aug 26 '21 at 07:07
  • 1
    Yes, the mount point name is arbitrary - you can use any name, even mount the same container multiple times. Usually people are trying to use original names just to understand where the data go, but if you have same names, you need to add something unique to it, like make name from combination of storage account + container name – Alex Ott Aug 26 '21 at 07:12