1

For Processing a huge amount of Data, we wanted to use Azure Data Lake Gen2 Storage with Azure Batch. Here's what I have tried:

I created Pool, Job, and Uploaded File to the Data Lake file system (reference taken from Microsoft Docs). When the Batch Task tried downloading the resource file from the Data Lake Filesystem it failed. Here's the code:

var poolId = Guid.NewGuid().ToString(); //using poolId for fileSystem, pool, and job
var sharedKeyCredential = new StorageSharedKeyCredential(storageAccountName, storageAccountKey);
string dfsUri = "https://" + storageAccountName + ".dfs.core.windows.net";
DataLakeServiceClient DataLakeServiceClient = new DataLakeServiceClient(new Uri(dfsUri), sharedKeyCredential);

//Create File System
await DataLakeServiceClient.CreateFileSystemAsync(poolId);

//Create Directory
DataLakeFileSystemClient fileSystemClient = DataLakeServiceClient.GetFileSystemClient(poolId);
await fileSystemClient.CreateDirectoryAsync("my-directory");

//Upload File To FileSystem
DataLakeDirectoryClient directoryClient = fileSystemClient.GetDirectoryClient("my-directory");
DataLakeFileClient fileClient = directoryClient.GetFileClient(fileName);
await fileClient.UploadAsync(filePath);

//Pool, Job Created (keeping JobId = poolId), Now adding task to the Job
using (var batchClient = BatchClient.Open(new BatchTokenCredentials(batchAccountUrl, tokenProvider)))
{
    var inputFile = ResourceFile.FromUrl(fileClient.Uri.AbsoluteUri, fileName);
    var task = new CloudTask(TaskId, CommandLine)
    {
        UserIdentity = new UserIdentity(new AutoUserSpecification(elevationLevel: ElevationLevel.Admin, scope: AutoUserScope.Task)),
        ResourceFiles = new List<ResourceFile> { inputFile }, //Add resource file
        OutputFiles = CreateOutputFiles(batchStorageAccount, poolId) //any *.txt file
    };
    batchClient.JobOperations.AddTask(poolId, task);
}

After adding task, I get ResourceContainerAccessDenied Error - Which means the File that was uploaded to the Storage, BatchService task didn't have rights to access the file. ResourceContainerAccessDenied

When I try using Storage Containers, the Batch Service works well as expected. In case of StorageContainers the authentication is done using SAS token. But in this case I am not able to figure out how to use SAS token or how to authenticate the Storage for the BatchService to access the Resource file in the Node.

Any other alternative for Data Lake Gen2 File System can also be helpful.

Gour Gopal
  • 581
  • 7
  • 22

0 Answers0