-1

I'm looking to export Azure Monitor data from Log Analytics to a storage account and the read the JSON files into Databricks using PySpark.

The blob path for the Log Analytics export contains an equals (=) sign and Databricks throws and exception when using the path.

WorkspaceResourceId=/subscriptions/subscription-id/resourcegroups/<resource-group>/providers/microsoft.operationalinsights/workspaces/<workspace>/y=<four-digit numeric year>/m=<two-digit numeric month>/d=<two-digit numeric day>/h=<two-digit 24-hour clock hour>/m=<two-digit 60-minute clock minute>/PT05M.json

Log Analytics Data Export

Is there a way to escape the equals sign so that the JSON files can be loaded from the blob location?

Lymedo
  • 576
  • 9
  • 21
  • 2
    I have no problems reading paths containing `=` on dbfs backed by azure blob storage(reading with`spark.read.format("json").load(path)`). Could you elaborate on how you are reading in the data and post the exception that you receive. – fskj Dec 22 '21 at 11:07
  • Strange. I even get an error when using dbutils.fs.ls(). Are you using blob storage or ADLS with hierarchical namespace? – Lymedo Dec 22 '21 at 15:15
  • I'm using ADLS with hierarchical namespace enabled. – fskj Dec 22 '21 at 15:25
  • 1
    Must be the blob endpoint. Will try it with ADLS HNS. – Lymedo Dec 22 '21 at 19:33

1 Answers1

0

I tried the similar use case referring from Microsoft Documentation, below are the steps:

  1. Mount the storage container. We can do it with python code as below, make sure you pass all the parameters correct, because incorrect parameters will lead to multiple different errors.

     dbutils.fs.mount(
           source = "wasbs://<container-name>@<storage-account-name>.blob.core.windows.net",
           mount_point = "/mnt/<mount-name>",
           extra_configs = {"<conf-key>":dbutils.secrets.get(scope = "<scope-name>", key = "<key-name>")})
    

    Below are the parameters description:

    • <storage-account-name> is the name of your Azure Blob storage account.
    • <container-name> is the name of a container in your Azure Blob storage account.
    • <mount-name> is a DBFS path representing where the Blob storage container or a folder inside the container (specified in source) will be mounted in DBFS.
    • <conf-key> can be either fs.azure.account.key.<storage-account-name>.blob.core.windows.net or fs.azure.sas.<container-name>.<storage-account-name>.blob.core.windows.net
    • dbutils.secrets.get(scope = "<scope-name>", key = "<key-name>") gets the key that has been stored as a secret in a secret scope.
  2. Then you can access those files as below:

     df = spark.read.text("/mnt/<mount-name>/...")
     df = spark.read.text("dbfs:/<mount-name>/...")
    

Also there are multiple ways in accessing the file, all of these were mentioned clearly in the doc.

And check this Log Analytics workspace doc to understand about exporting the data to Azure Storage.

SaiKarri-MT
  • 1,174
  • 1
  • 3
  • 8
  • the problem is that Microsoft uses the same name `m` for both months & minutes. As result, when you read with partitions inference, then Spark job fails because of the duplicate columns – Alex Ott Dec 29 '21 at 13:30