6

I am connecting to a RESTful api using Azure Synapse Analytics notebook and write the json file to Azure Data Lake Storage Gen 2.

pyspark code:

import requests
response = requests.get('https://api.web.com/v1/data.json')
data = response.json()
from pyspark.sql import *
df = spark.read.json(sc.parallelize([data]))
from pyspark.sql.types import *
account_name = "name of account"
container_name = "name of container"
relative_path = "name of file path"    #abfss://<container_name>@<storage_account_name>.dfs.core.windows.net/<path>
adls_path = 'abfss://%s@%s.dfs.core.windows.net/%s' % (container_name, account_name, relative_path)
spark.conf.set('fs.%s@%s.dfs.core.windows.net/%s' % (container_name, account_name), "account_key") #not sure I'm doing the configuration right
df.write.mode("overwrite").json(adls_path)

Error:

Py4JJavaError : An error occurred while calling o536.json.
: Operation failed: "This request is not authorized to perform this operation.", 403, HEAD, https://storageaccount.dfs.core.windows.net/container/?upn=false&action=getAccessControl&timeout=90
paone
  • 828
  • 8
  • 18
  • 1
    Do you have _Storage Blob Data Contributor_ permission? Your question is related to [this](https://learn.microsoft.com/en-us/answers/questions/38354/synapse-analytics-cant-connect-to-external-storage.html) discussion on Microsoft`s forum – Kafels Sep 16 '21 at 01:01
  • Hi @Kafels, I do have Storage Blob Data Contributor permission and still run into this error. – paone Sep 16 '21 at 15:47
  • Is your storage networking set to private or restricted with firewall? There is a Synapse Managed Private Endpoint that must be created for dfs to ensure the notebook making the request from the SQL pool can connect to storage from Azure managed network through private endpoint and thus bypass storage firewall ip restriction. – Josh Nov 03 '22 at 13:17

1 Answers1

3

Note: Storage Blob Data Contributor: Use to grant read/write/delete permissions to Blob storage resources.

If you are not assigning Storage Blob Data Contributor to users who are accessing the storage account, they will be not able to access the data from ADLS gen2 due to the lack of permission on the storage account.

If they try to access data from ADLS gen2 without the "Storage Blob Data Contributor" role on the storage account, they will receive the error message: Operation failed: "This request is not authorized to perform this operation.",403.

Once the storage account is created, select Access control (IAM) from the left navigation. Then assign the following roles or ensure they are already assigned. Assign yourself to the Storage Blob Data Owner role on the Storage Account.

After granting Storage Blob Data Contributor role on the storage account wait for 5-10 minutes and re-try the operation.

enter image description here

CHEEKATLAPRADEEP
  • 12,191
  • 1
  • 19
  • 42
  • Hi @CHEEKATLAPRADEEP-MSFT, I do have the Storage Blob Data Contributor role and not sure why I am running into this error. – paone Sep 16 '21 at 17:12
  • @Kafels, I was able to get it working after changing the role to Storage Blob Data Owner. Trying to find out why. – paone Sep 17 '21 at 00:47
  • Hi @paone did you ever figure out the root cause here? Inside of a Notebook the data writes correctly, however as soon as I run the Spark notebook through a Pipeline it does not and I have tried adding rhe Synapse Workspace as Data Contributor and Owner on the Storage Account AND ACLs – Rodney Feb 01 '22 at 05:29
  • Hi @Rodney, I wasn't able to get to the root cause. Except for the limited troubleshooting I did and couldn't figure out why partly because of the fact that the Azure administrators were busy. – paone Feb 02 '22 at 22:42
  • Ok, thanks. I think I may have a slightly different issue as we are deploying within a Managed Resource Group – Rodney Feb 03 '22 at 00:29