My Http Triggered Azure Function has a workflow that consists of 3 steps:
It receives an API call with some parameters
It reads the data from the Azure Blob with this function:
def read_dataframe_from_blob(account_name, account_key, container_name, blob_name):
# Create a connection string to the Azure Blob storage account
connect_str = f"DefaultEndpointsProtocol=https;AccountName={account_name};AccountKey={account_key};EndpointSuffix=core.windows.net"
# Create a BlobServiceClient object using the connection string
blob_service_client = BlobServiceClient.from_connection_string(connect_str)
# Get a reference to the Parquet blob
blob_client = blob_service_client.get_blob_client(container=container_name, blob=blob_name)
# Download the blob data as a stream
blob_data = blob_client.download_blob()
# Read the Parquet data from the stream into a pandas DataFrame
df = pd.read_parquet(io.BytesIO(blob_data.readall()))
return df
- It preprocesses the data from 1. and returns some output.
I previously created a very similiar workflow and the Function Log Stream was pretty clean, it included only elements defined in logging. However, when I read the data from blob, the logs in Azure Function Log Stream (and local, of course) start with:
2023-06-05T07:35:42Z [Information] Request URL: 'https://myaccount.blob.core.windows.net/mycontainer/my.parquet'
Request method: 'GET'
Request headers:
'x-ms-range': 'REDACTED'
'x-ms-version': 'REDACTED'
'Accept': 'application/xml'
'User-Agent': 'azsdk-python-storage-blob/12.16.0 Python/3.10.11 (Linux-5.10.164.1-1.cm1-x86_64-with-glibc2.31)'
'x-ms-date': 'REDACTED'
'x-ms-client-request-id': '932afd88-0373-11ee-8724-1270efe16c2d'
'Authorization': 'REDACTED'
No body was attached to the request
2023-06-05T07:35:42Z [Information] Response status: 206
Response headers:
'Content-Length': '33554432'
'Content-Type': 'application/octet-stream'
'Content-Range': 'REDACTED'
'Last-Modified': 'Thu, 01 Jun 2023 08:00:30 GMT'
'Accept-Ranges': 'REDACTED'
'ETag': '"0x8DB627644CFEA3E"'
'Server': 'Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0'
'x-ms-request-id': '08843836-f01e-0019-6780-974298000000'
'x-ms-client-request-id': '932afd88-0373-11ee-8724-1270efe16c2d'
'x-ms-version': 'REDACTED'
'x-ms-creation-time': 'REDACTED'
'x-ms-blob-content-md5': 'REDACTED'
'x-ms-lease-status': 'REDACTED'
'x-ms-lease-state': 'REDACTED'
'x-ms-blob-type': 'REDACTED'
'Content-Disposition': 'REDACTED'
'x-ms-server-encrypted': 'REDACTED'
'Date': 'Mon, 05 Jun 2023 07:35:42 GMT'
...repeated multiple times. Then I get the info from my logs.
What is the reason for such behaviour? Is there any smooth way to optimize the code or avoid these bloated logs?
Edit: I've found a similar discussion here but I'm not sure how to replicate it for Python app.
Edit2: It's not a solution, but I've found a github bug report here
Still - would appreciate any workarounds.