2

I want to write a file to my Azure DataLake Gen2 with an Azure Function and Python.

Unfortunately I'm having the following authentication issue:

Exception: ClientAuthenticationError: (InvalidAuthenticationInfo) Server failed to authenticate the request. Please refer to the information in the www-authenticate header.

'WWW-Authenticate': 'REDACTED'

Both my account and the Function app should have the necessary roles for accessing my DataLake assigned.

And here is my function:

import datetime
import logging

from azure.identity import DefaultAzureCredential
from azure.storage.filedatalake import DataLakeServiceClient
import azure.functions as func

def main(mytimer: func.TimerRequest) -> None:
    utc_timestamp = datetime.datetime.utcnow().replace(
        tzinfo=datetime.timezone.utc).isoformat()

    if mytimer.past_due:
        logging.info('The timer is past due!')

    credential = DefaultAzureCredential()
    service_client = DataLakeServiceClient(account_url="https://<datalake_name>.dfs.core.windows.net", credential=credential)

    file_system_client = service_client.get_file_system_client(file_system="temp")
    directory_client = file_system_client.get_directory_client("test")
    file_client = directory_client.create_file("uploaded-file.txt")
    
    file_contents = 'some data'
    file_client.append_data(data=file_contents, offset=0, length=len(file_contents))
    file_client.flush_data(len(file_contents))


    logging.info('Python timer trigger function ran at %s', utc_timestamp)

What am I missing?

THX & BR

Peter

Into Numbers
  • 923
  • 11
  • 19

2 Answers2

1

The problem seems come from the DefaultAzureCredential.

The identity of DefaultAzureCredential uses depends on the environment. When an access token is needed, it requests one using these identities in turn, stopping when one provides a token:

1. A service principal configured by environment variables. 
2. An Azure managed identity. 
3. On Windows only: a user who has signed in with a Microsoft application, such as Visual Studio.
4. The user currently signed in to Visual Studio Code.
5. The identity currently logged in to the Azure CLI.

In fact, you can completely generate datalake service objects without using the default credentials. You can do this (connect directly using the connection string):

import logging
import datetime

from azure.storage.filedatalake import DataLakeServiceClient
import azure.functions as func


def main(req: func.HttpRequest) -> func.HttpResponse:
    connect_str = "DefaultEndpointsProtocol=https;AccountName=0730bowmanwindow;AccountKey=xxxxxx;EndpointSuffix=core.windows.net"
    utc_timestamp = datetime.datetime.utcnow().replace(
        tzinfo=datetime.timezone.utc).isoformat()

    service_client = DataLakeServiceClient.from_connection_string(connect_str)

    file_system_client = service_client.get_file_system_client(file_system="test")
    directory_client = file_system_client.get_directory_client("test")
    file_client = directory_client.create_file("uploaded-file.txt")
    
    file_contents = 'some data'
    file_client.append_data(data=file_contents, offset=0, length=len(file_contents))
    file_client.flush_data(len(file_contents))

    return func.HttpResponse(
            "Test.",
            status_code=200
    )

In addition, in order to ensure smooth data writing, please check whether your datalake has access restrictions.

Cindy Pau
  • 13,085
  • 1
  • 15
  • 27
  • 1
    My problem is that connection strings for the Datalake are disabled because of Security reasons. DefaultAzureCredential is using my CLI identity, which has sufficient rights on the DataLake. – Into Numbers Feb 17 '21 at 13:50
  • @IntoNumbers Okay, I will try to reproduce the same issue tomorrow, now there is something to do. Thanks for reply.:) – Cindy Pau Feb 17 '21 at 13:54
  • that would be awesome, thank you very much! – Into Numbers Feb 17 '21 at 14:26
  • @IntoNumbers I'm pretty sure the problem is not in your code. Because my environment is not the same as yours, and I have tried various methods but have not reproduced the same error as yours (the worst is that the error on your side does not show the specific problem). So, I would like to ask you to test it on your side. I think you can try the following steps to make the 'credential' work both on locally and on Azure respectively: – Cindy Pau Feb 18 '21 at 08:58
  • @IntoNumbers For local, please deprecate `DefaultAzureCredential()`, because `DefaultAzureCredential()` seems to be a problem on your side. You can try to use `AzureCliCredential()` instead of `DefaultAzureCredential()` (Before that, please log in to your azure account by command in advance and ensure your account has write permissions.) – Cindy Pau Feb 18 '21 at 09:00
  • @IntoNumbers For Azure, please also do not use `DefaultAzureCredential()`, please use `ManagedIdentityCredential()` instead, and then enable the 'identity' of the function app and give it write access to datalake. – Cindy Pau Feb 18 '21 at 09:01
  • @IntoNumbers And you can first try to let the main logic pass in HttpTrigger instead of TimeTrigger(so you can avoid waiting). You can do a try of above steps, and any update of this question please let me know. Thanks.:) – Cindy Pau Feb 18 '21 at 09:04
  • @IntoNumbers The RBAC role you need to give is: `Storage Blob Data Owner`. – Cindy Pau Feb 18 '21 at 09:09
  • Thanks a lot for your effort. I will try to change DefaultAzureCredential() to the mentioned alternatives tonight and keep you up to date. As far as RBAC is concerned, my function has Storage Blob Data Contributor, is there a reason why you suggest Storage Blob Data Owner ? – Into Numbers Feb 18 '21 at 14:02
  • @IntoNumbers contributer should also work. – Cindy Pau Feb 18 '21 at 14:03
  • I've tried your suggestions, but now some new issues have developed: in the meantime the role assignment in the datelake for the function is suddenly marked as "unknown", and when I try to recreate it, I can't select the FunctionApp anymore. This is also true when I try to use a new FunctionApp. – Into Numbers Feb 19 '21 at 10:14
  • 1
    But I had success with AzureCliCredential(), and due to the fact, that it now seems to be a role assignment problem within Azure, I will accept your answer. Thanks again for your efforts! BR Peter – Into Numbers Feb 19 '21 at 10:16
  • @IntoNumbers Can it works on local by Azure Cli certificate? – Cindy Pau Feb 19 '21 at 10:16
  • 1
    yes, it did work on local with AzureCliCredential() – Into Numbers Feb 19 '21 at 10:17
0

The function suggested by Bowman Zhu contains an error. According to the Azure documentation the parameter "length" expects length in bytes. However, the suggested function uses length in characters. Some of these characters may consist of multiple bytes. In such cases the function will not write all bytes of file_contents to the file, and thus cause data loss!

Therefore,

file_client.append_data(data=file_contents, offset=0, length=len(file_contents))
file_client.flush_data(len(file_contents))

must be something like:

length = len(file_contents.encode())
file_client.append_data(data=file_contents, offset=0, length=length)
file_client.flush_data(offset=length)
Guido van Steen
  • 505
  • 5
  • 6