1

I have sent the data bricks logs to storage account by enabling diagnostic setting, Now I have to read those logs using azure data bricks for advance analytics. when I try to mount the path it works but reads wont work .

step 1- 

containerName = "insights-logs-jobs"
storageAccountName = "smk"
config = "fs.azure.sas." + containerName+ "." + storageAccountName + ".blob.core.windows.net"
sas = "sp=r&st=2021-12-07T08:07:08Z&se=2021-12-07T16:07:08Z&spr=https&sv=2020-08-04&sr=b&sig=3skdlskdlkf5tt3FiR%2FLM%3D"
spark.conf.set(config,sas)

step 2 

df = spark.read.json("wasbs://insights-logs-jobs.gtoollogging.blob.core.windows.net/resourceId=/SUBSCRIPTIONS/xxxBD-3070-4AFD-A44C-3489956CE077/RESOURCEGROUPS/xxxx-xxx-RG/PROVIDERS/MICROSOFT.DATABRICKS/WORKSPACES/xxx-ADB/y=2021/m=12/d=07/h=00/m=00/*.json")


Getting below error

 shaded.databricks.org.apache.hadoop.fs.azure.AzureException: Unable to access container $root in account insights-logs-jobs.gtjjjng.blob.core.windows.net using anonymous credentials, and no credentials found for them  in the configuration.
    at shaded.databricks.org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.connectUsingAnonymousCredentials(AzureNativeFileSystemStore.java:796)
    at shaded.databricks.org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.createAzureStorage.

Tried many approach but getting below error. [![enter image description here][1]][1]

anuj
  • 124
  • 2
  • 13
  • please make sure on data format in storage account . Mostly cluster logs being stored in parquet format . – Karthikeyan Rasipalay Durairaj Nov 23 '21 at 15:37
  • No it is in json file genarting yy/mm/dd/hh format is json.This is the below path resourceId=/SUBSCRIPTIONS/dklgd-3070-4AFD-A44C-3489956CE077/RESOURCEGROUPS/xyz-PROD-RG/PROVIDERS/MICROSOFT.DATABRICKS/WORKSPACES/xyz-PROCESS-PROD-ADB/y=2021/m=10/d=07/h=10/m=00/PT1H.JSON – anuj Nov 23 '21 at 15:41

2 Answers2

0

With help of below code I can able to read the data from Azure storage account using pyspark.

df = spark.read.json("wasbs://container_@storage_account.blob.core.windows.net/sub_folder/*.json")
df.show()

This gives me the complete data of all my json files in a terminal.

Or you can give a try in a below way:

storage_account_name = "ACC_NAME"
storage_account_access_key = "ACC_key"

spark.conf.set(
  "fs.azure.account.key."+storage_account_name+".blob.core.windows.net",
  storage_account_access_key)

file_type = "json"
file_location = "wasbs://location/path"


df = spark.read.format(file_type).option("inferSchema", "true").load(file_location)
SaiKarri-MT
  • 1,174
  • 1
  • 3
  • 8
  • These syntax will work when you try to read from storage account, but I am trying to read the logs which is send by enabling diagnosis tics setting to the logging storage account name. I have already given complete path also. – anuj Nov 25 '21 at 04:58
  • Edited my question tried sas token approach but not able to read the databricks logs which is present in storage account – anuj Dec 07 '21 at 10:47
0

this the way databricks mounts works .

If you attempt to create a mount point within an existing mount point, for example:

Mount one storage account to /mnt/storage1

Mount a second storage account to /mnt/storage1/storage2

Reason : This will fail because nested mounts are not supported in Databricks. recommended one is creating separate mount entries for each storage object.

For example:

Mount one storage account to /mnt/storage1

Mount a second storage account to /mnt/storage2

You can ref : Link

as workaround - you can read it from storage account itself for processing instead of mount.