0

My Http Triggered Azure Function has a workflow that consists of 3 steps:

  1. It receives an API call with some parameters

  2. It reads the data from the Azure Blob with this function:

def read_dataframe_from_blob(account_name, account_key, container_name, blob_name):
    # Create a connection string to the Azure Blob storage account
    connect_str = f"DefaultEndpointsProtocol=https;AccountName={account_name};AccountKey={account_key};EndpointSuffix=core.windows.net"

    # Create a BlobServiceClient object using the connection string
    blob_service_client = BlobServiceClient.from_connection_string(connect_str)

    # Get a reference to the Parquet blob
    blob_client = blob_service_client.get_blob_client(container=container_name, blob=blob_name)

    # Download the blob data as a stream
    blob_data = blob_client.download_blob()

    # Read the Parquet data from the stream into a pandas DataFrame
    df = pd.read_parquet(io.BytesIO(blob_data.readall()))

    return df
  1. It preprocesses the data from 1. and returns some output.

I previously created a very similiar workflow and the Function Log Stream was pretty clean, it included only elements defined in logging. However, when I read the data from blob, the logs in Azure Function Log Stream (and local, of course) start with:

2023-06-05T07:35:42Z   [Information]   Request URL: 'https://myaccount.blob.core.windows.net/mycontainer/my.parquet'
Request method: 'GET'
Request headers:
    'x-ms-range': 'REDACTED'
    'x-ms-version': 'REDACTED'
    'Accept': 'application/xml'
    'User-Agent': 'azsdk-python-storage-blob/12.16.0 Python/3.10.11 (Linux-5.10.164.1-1.cm1-x86_64-with-glibc2.31)'
    'x-ms-date': 'REDACTED'
    'x-ms-client-request-id': '932afd88-0373-11ee-8724-1270efe16c2d'
    'Authorization': 'REDACTED'
No body was attached to the request
2023-06-05T07:35:42Z   [Information]   Response status: 206
Response headers:
    'Content-Length': '33554432'
    'Content-Type': 'application/octet-stream'
    'Content-Range': 'REDACTED'
    'Last-Modified': 'Thu, 01 Jun 2023 08:00:30 GMT'
    'Accept-Ranges': 'REDACTED'
    'ETag': '"0x8DB627644CFEA3E"'
    'Server': 'Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0'
    'x-ms-request-id': '08843836-f01e-0019-6780-974298000000'
    'x-ms-client-request-id': '932afd88-0373-11ee-8724-1270efe16c2d'
    'x-ms-version': 'REDACTED'
    'x-ms-creation-time': 'REDACTED'
    'x-ms-blob-content-md5': 'REDACTED'
    'x-ms-lease-status': 'REDACTED'
    'x-ms-lease-state': 'REDACTED'
    'x-ms-blob-type': 'REDACTED'
    'Content-Disposition': 'REDACTED'
    'x-ms-server-encrypted': 'REDACTED'
    'Date': 'Mon, 05 Jun 2023 07:35:42 GMT'

...repeated multiple times. Then I get the info from my logs.

What is the reason for such behaviour? Is there any smooth way to optimize the code or avoid these bloated logs?

Edit: I've found a similar discussion here but I'm not sure how to replicate it for Python app.

Edit2: It's not a solution, but I've found a github bug report here

Still - would appreciate any workarounds.

Freejack
  • 168
  • 10

1 Answers1

-2
import logging

# Set the desired log level (e.g., INFO, DEBUG, ERROR)
logging.basicConfig(level=logging.INFO)

def main(req):
    # Your code to access data from Azure Blob

    # Example logging statements
    logging.info("Accessing data from Azure Blob")
    logging.debug("Debug message")
    logging.error("Error message")

    # Rest of your function code

    return "Function executed successfully"

In this code, logging.basicConfig() sets up the basic configuration for logging, including the desired log level. You can adjust the log level to control the verbosity of the logs (e.g., logging.INFO, logging.DEBUG, logging.ERROR).

luk2302
  • 55,258
  • 23
  • 97
  • 137
  • I'm afraid it is not related with logging, but some pre-defined telemetry that I don't know how to access. – Freejack Jun 05 '23 at 09:58