0

I have a Spark application and I want to access an Azure Blob container by writing the event log into the blob container.

I want to authenticate using a SAS token. The SAS token generated by the Azure portal works fine. However, the one generated by the C# client does not work. I dont know what's the difference between these two SAS token.

This is how I generate the SAS token in Azure portal

enter image description here

This is my spark conf

    spark.eventLog.dir: "abfss://sparkevent@lydevstorage0.dfs.core.windows.net/log"
    spark.hadoop.fs.azure.account.auth.type.lydevstorage0.dfs.core.windows.net: "SAS"
    spark.hadoop.fs.azure.sas.fixed.token.lydevstorage0.dfs.core.windows.net: ""
    spark.hadoop.fs.azure.sas.token.provider.type.lydevstorage0.dfs.core.windows.net: "org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider"

This is C# code:

            BlobSasBuilder blobSasBuilder = new BlobSasBuilder()
            {
                StartsOn = DateTimeOffset.UtcNow.AddDays(-1),
                ExpiresOn = DateTimeOffset.UtcNow.AddDays(1),
                Protocol = SasProtocol.HttpsAndHttp,
                BlobContainerName = "sparkevent",
                Resource = "b" // I also tried "c"
            };
            blobSasBuilder.SetPermissions(BlobContainerSasPermissions.All);

            string sasToken2 = blobSasBuilder.ToSasQueryParameters(new StorageSharedKeyCredential("lydevstorage0", <access key>)).ToString();

The error is

Exception in thread "main" java.nio.file.AccessDeniedException: Operation failed: "Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.", 403, HEAD, https://lydevstorage0.dfs.core.windows.net/sparkevent/?upn=false&action=getAccessControl&ti
meout=90&sv=2021-02-12&spr=https,http&st=2023-06-26T03:33:27Z&se=2023-06-28T03:33:27Z&sr=c&sp=racwdxlti&sig=XXXXX
        at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.checkException(AzureBlobFileSystem.java:1384)
        at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.getFileStatus(AzureBlobFileSystem.java:611)
        at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.getFileStatus(AzureBlobFileSystem.java:599)
        at org.apache.spark.deploy.history.EventLogFileWriter.requireLogBaseDirAsDirectory(EventLogFileWriters.scala:77)
        at org.apache.spark.deploy.history.SingleEventLogFileWriter.start(EventLogFileWriters.scala:221)
        at org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:83)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:612)
        at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2704)
        at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:953)
        at scala.Option.getOrElse(Option.scala:189)
        at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:947)
        at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:30)
        at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: Operation failed: "Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.", 403, HEAD, https://lydevstorage0.dfs.core.windows.net/sparkevent/?upn=false&action=getAccessControl&timeout=90&sv=2021-02-12&spr=https,http&st=2023-06-26T0
3:33:27Z&se=2023-06-28T03:33:27Z&sr=c&sp=racwdxlti&sig=XXXXX
        at org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.completeExecute(AbfsRestOperation.java:231)
        at org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.lambda$execute$0(AbfsRestOperation.java:191)
        at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDurationOfInvocation(IOStatisticsBinding.java:464)
        at org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.execute(AbfsRestOperation.java:189)
        at org.apache.hadoop.fs.azurebfs.services.AbfsClient.getAclStatus(AbfsClient.java:911)
        at org.apache.hadoop.fs.azurebfs.services.AbfsClient.getAclStatus(AbfsClient.java:892)
        at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.getIsNamespaceEnabled(AzureBlobFileSystemStore.java:358)
        at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.getFileStatus(AzureBlobFileSystemStore.java:932)
        at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.getFileStatus(AzureBlobFileSystem.java:609)
        ... 23 more

I tried the SAS token generated in Azure portal, it worked fine.

mifan
  • 13
  • 5

2 Answers2

1

If you are using Data-lake-gen2 account with a hierarchical namespace, you can use the Datalake package with the below code to create a SAS token using C#.

Code:

using Azure.Storage;
using Azure.Storage.Files.DataLake;
using Azure.Storage.Sas;

namespace SAStoken
{
    class Program
    {
        private static void Main()
        {
            var AccountName = "venkat098";
            var AccountKey = "";
            var FileSystemName = "filesystem1";
            StorageSharedKeyCredential key = new StorageSharedKeyCredential(AccountName, AccountKey);
            string dfsUri = "https://" + AccountName + ".dfs.core.windows.net";
            var dataLakeServiceClient = new DataLakeServiceClient(new Uri(dfsUri), key);
            var directoryclient = dataLakeServiceClient.GetFileSystemClient(FileSystemName);
            DataLakeSasBuilder sas = new DataLakeSasBuilder()
            {
                FileSystemName = FileSystemName,//container name
                Resource = "d",
                IsDirectory = true,
                ExpiresOn = DateTimeOffset.UtcNow.AddDays(7),
                Protocol = SasProtocol.HttpsAndHttp,
            };
            sas.SetPermissions(DataLakeAccountSasPermissions.All);
            Uri sasUri = directoryclient.GenerateSasUri(sas);
            Console.WriteLine(sasUri);
        }

    }
}

Output:

https://venkat098.dfs.core.windows.net/filesystem1?sv=2022-11-02&spr=https,http&se=2023-07-04T05%3A53%3A39Z&sr=c&sp=racwdl&sig=xxxxxx

enter image description here

I checked the URL with the image file it working successfully.

https://venkat098.dfs.core.windows.net/filesystem1/cell_division.jpeg?sv=2022-11-02&spr=https,http&se=2023-07-04T05%3A53%3A39Z&sr=c&sp=racwdl&sig=xxxxx

Browser: enter image description here

Reference:

Use .NET to manage data in Azure Data Lake Storage Gen2 - Azure Storage | Microsoft Learn

Venkatesan
  • 3,748
  • 1
  • 3
  • 15
  • Thanks for the answer @Venkatesan. But I still get such error ```https://lydevstorage0.dfs.core.windows.net/sparkevent/log/spark-738c5488436d4a71a0b8957677ff177b.inprogress?action=setAccessControl&timeout=90&s v=2022-11-02&spr=https,http&se=2023-07-04T06:40:28Z&sr=c&sp=racwdl&sig=XXXXX, AuthorizationPermissionMismatch, "This request is not authorized to perform this operation using this permission. RequestId:093e86a5-c01f-0048-48c2-a8ecfc000000 Time:2023-06-27T06:41:35.3197275Z"``` – mifan Jun 27 '23 at 06:45
  • Seems this SAS don't have the permission to perform *setAccessControl* – mifan Jun 27 '23 at 06:47
  • Check the network setting once it is in **`Enabled from all networks`**. – Venkatesan Jun 27 '23 at 06:51
  • It's already set to ```Enabled from all networks```. And the SAS token I generated in Portal does not has such error. – mifan Jun 27 '23 at 06:53
  • I think your Url should be `https://lydevstorage0.dfs.core.windows.net/sparkevent/log/something.txt?sv=2022-11-02&spr=https,http&se=2023-07-04T05%3A53%3A39Z&sr=c&sp=racwdl&sig=xxxxxxx` to access the file. – Venkatesan Jun 27 '23 at 07:02
  • I am able to access the file, but the link above is not generated manually by myself, the error is from spark internal. What spark do is to create the file ```spark-738c5488436d4a71a0b8957677ff177b.inprogress```, put event logs into it. And according to the link in the error message, Spark also do "setAcessControl", which failed because the permission is not granted by the SAS token. – mifan Jun 27 '23 at 07:11
  • 1
    Thanks @Venkatesan ! after changing the permission set to DataLakeFileSystemSasPermissions, it works! – mifan Jun 27 '23 at 08:55
  • @Venkatesan, sure, I have accepted your answer, and please slightly modified it by changing the permission set to DataLakeFileSystemSasPermissions, this can avoid confusing peaple encounter the same issue. – mifan Jun 27 '23 at 09:56
0

The root cause is that my Spark program cannot get or set AccessControl. I should not have used BlobSasToken or AccountSasBuilder because the blob container itself is not aware of what ACL is. Therefore, the SAS tokens generated by them naturally do not have ACL manipulation permissions.

With the help of @Venkatesan, I learned that I can also use DataLakeSasBuilder. DataLake follows the HDFS standard, so it is aware of what ACL is. However, the permission set used by @Venkatesan is DataLakeAccountSasPermissions, which does not include ManageAccessControl permission. The correct permission set is DataLakeFileSystemSasPermissions. After switching to this permission set, my program can work properly.

mifan
  • 13
  • 5