1

I am using the following function to send some logging standard output from Databricks to Azure application insights logs.

my function

import logging
from opencensus.ext.azure.log_exporter import AzureLogHandler
from opencensus.trace import config_integration
from opencensus.trace.samplers import AlwaysOnSampler
from opencensus.trace.tracer import Tracer

def custom_logging_function(log_type, instrumentation_key_value, input_x):
    """
    Purpose: The standard output sent to Application insights logs
    Inputs: -
    Return: -
    """
    config_integration.trace_integrations(['logging'])
    logging.basicConfig(format='%(asctime)s traceId=%(traceId)s spanId=%(spanId)s %(message)s')
    tracer=Tracer(sampler=AlwaysOnSampler())
    
    logger=logging.getLogger(__name__)
    logger.addHandler(AzureLogHandler(connection_string='InstrumentationKey={0}'.format(instrumentation_key_value)))
    
    if log_type=="INFO" or log_type=="SUCESSFULL":
        #[UPDATE]
        logger.setLevel(logging.INFO)
        logger.info(input_x)
        #logging.info(input_x)
    elif log_type=="ERROR":
        #[UPDATE]
        logger.setLevel(logging.ERROR)
        logger.exception(input_x)
        #logging.exception(input_x)
    else:
        logger.warning(input_x)

[UPDATE] By setting the logging level to INFO, ERROR you can log different types of traces.

This function even though it is correctly executed it's faulty for the following two reasons:

Reason 1
When I want to print a logger.info() message it's not logged successfully in Application insights. For an non-explainable reason only the logger.warning() messages are successfully sent to Application insights logs. For example,

custom_logging_function("INFO", instrumentation_key_value, "INFO: {0} chronical dates in the specified time-frame have been created!".format(len(date_list)))

# Uses the logger.info() based on my function!

Output
enter image description here

This is never logged. But rather the following only it's logged,

custom_logging_function("WARNING", instrumentation_key_value, "INFO: {0} chronical dates in the specified time-frame have been created!".format(len(date_list)))

# Uses the logger.warning() based on my function!

Output
enter image description here

The reason 1 has been solved by me..please check my function edit

------------------------------------------------------------------------

Reason 2

The same message is logged multiple times, instead only once.
Some code to interpret the same problem,

# Set keyword parameters
time_scale=12
time_frame_repetition=1
timestamp_snapshot=datetime.utcnow()

round_up = math.ceil(time_frame_repetition*365/time_scale)
day_list = [(timestamp_snapshot - timedelta(days=x)).strftime("%d") for x in range(round_up)]
month_list = [(timestamp_snapshot - timedelta(days=x)).strftime("%m") for x in range(round_up)]
year_list = [(timestamp_snapshot - timedelta(days=x)).strftime("%Y") for x in range(round_up)]

date_list=[[day_list[i], month_list[i], year_list[i]] for i in range(0, len(day_list))]

custom_logging_function("INFO", instrumentation_key_value, "INFO: {0} chronical dates in the specified time-frame have been created!".format(len(date_list))) #the function already written in the start of my post.

The output of the above code snippet is logged more than 1 time(s) in Application insights and I am trying to figure out why.

Output log in Application insights

enter image description here

As you can see from the output of the query the same row is logged multiple times.

What are your suggestions on the second matter since the first one was solved.


[UPDATE] based on the answer provided below by @Izchen

def instantiate_logger(instrumentation_key_value):
    config_integration.trace_integrations(['logging'])
    logging.basicConfig(format='%(asctime)s traceId=%(traceId)s spanId=%(spanId)s %(message)s')
    tracer=Tracer(sampler=AlwaysOnSampler())
    
    logger=logging.getLogger(__name__)
    
    logger.addHandler(AzureLogHandler(connection_string='InstrumentationKey={0}'.format(instrumentation_key_value)))
    
    return logger

logging_instance=instantiate_logger(instrumentation_key_value)

def custom_logging_function(logging_instance, disable_logging, log_type, input_x, *arguments):
    """
    Purpose: The standard output sent to Application insights logs
    Inputs: -
    Return: The logger object.
    """
    if disable_logging==0:

        if log_type=="INFO" or log_type=="SUCCESSFUL":
            logging_instance.setLevel(logging.INFO)
            logging_instance.info(input_x)
            print(input_x, *arguments)

        elif log_type=="ERROR":
            logging_instance.setLevel(logging.ERROR)
            logging_instance.exception(input_x)
            print(input_x, *arguments)

        else:
            logging_instance.warning(input_x)
            print(input_x, *arguments)

    else:
        print(input_x, *arguments)

Still the code above logs the output of this function:

date_list=merge_hierarchy_list(year_list, month_list, day_list, None, None)
custom_logging_function(logging_instance, disable_logging_value, "INFO", "INFO: {0} chronological dates in the specified time-frame have been created!".format(len(date_list)))

Output (logged 2 times in Application Insights Log traces):

"INFO: 31 chronological dates in the specified time-frame have been created!"

enter image description here

Community
  • 1
  • 1
NikSp
  • 1,262
  • 2
  • 19
  • 42
  • Where do you call this function? Are there some changes you're using multi-thread to call it? – Ivan Glasenberg May 29 '20 at 09:37
  • @IvanYang No I don't use multi-thread to call this function. After a loop is finished that creates the list of paths ....I call the logging function to log the number of elements inside the list_of_paths object (```len(list_of_paths )```) .... I add the custom_logging_function() after a loop that's all. – NikSp May 29 '20 at 11:43
  • 1
    Do you mind providing a sample code which can repro the issue? – Ivan Glasenberg May 30 '20 at 03:04
  • @IvanYang I have added some code to reproduce the same problem on Application Insights logs. Please note that you must have an active Azure Application insights service....Write me in comments if you need anything else – NikSp May 31 '20 at 06:49
  • 1
    It's weird. I used the code and just saw only one message are logged in app insights. I suggest you can check it as per the following: 1.not sure if it's the issue with your kusto query, can you just use "traces" to query? 2.in the client, use fiddler or debug, to see how many messages are sent. – Ivan Glasenberg Jun 01 '20 at 06:43
  • @IvanYang It's really weird I agree..According to your answer, first if I use traces only in my Kusto query I will see all the messages from other Azure services except from Databricks, which I don't want. I want only to see messages coming from Databricks and thus the Python Requests. Here is my Kusto query ```traces | where client_Browser contains "Python Requests" or cloud_RoleName contains "PythonShell.py" and iKey contains ""```...For the second comment you made I don't get the *debug* or *fiddler* approach in the client..Where do I configure this? – NikSp Jun 01 '20 at 07:12

1 Answers1

0

For reason 2:

Are you running your Python file within Databricks notebooks? The notebooks will keep the state of all objects that are instantiated (including the Python logger used). We have come across duplicated log entries before when users run their code multiple times in notebooks, because the AzureLogHandler is added as a handler to the root logger everytime the code is executed again. Running as a normal Python module should not cause this behaviour since state is not kept within subsequent runs.

If you are not using the notebooks, then the issue seems to be with something adding the AzureLogHandler multiple times. Are there multiple workers of some sort in your Databricks pipeline executing the same logic?

lzchen
  • 39
  • 2
  • 1
    UPDATE: It seems like you are instantiating a new AzureLogHandler for every call to custom_logging_function. This will in turn pass every log record through every handler attached to `logger` and as a result, export multiple of the exact same record to App Insights. Try taking out all the "config" related logic out, instantiating `AzureLogHandler` one time, adding it to `logger` and then within your logging calls only have `logger.warning...` – lzchen Jun 08 '20 at 18:55
  • To answer your question, initially clarify that YES I am running my notebook to a Databricks cluster with 2-3 workers. Also, I am not running any code in the notebook multiple times....In only one execution of a databricks code block I see the output of it logged 3 times for a reason. For the update you mentioned on the comments, please check my update on my post and see if this will solve the problem. As you have written I should initialize the logger outside of the main logging function. – NikSp Jun 09 '20 at 07:06
  • @Izchen also I want to add that my script is not a .py file but rather a databricks notebook file. And this behavior of multiple logging is still present when I run the notebook with an automated job (thus, only 1 instance). I don't execute the same databricks block with multiple workers at the same time. Maybe the mistake is in front of me but I can't see it. Executing everything once and with clear state still logs my output many times. – NikSp Jun 09 '20 at 07:54
  • 1
    @NikSp The above updated code you have submitted should be good in solving this duplicated log entries. If the problem is still persisting as you say, I'm leaning towards the configuration and how you are running this code in databricks to be the cause of the issue. To verify, is it possible for you to run this code just as a Python script? If you do not get duplicates, we can narrow it down to the possibility that either databricks is still keeping state somehow, or the workers are executing your configuration of the LogHandler mutliple times. We can start debugging then. – lzchen Jun 09 '20 at 16:51
  • @Izchen you are right about Python. Running the script attached as a Python program in a Jupyter notebook is not posting multiple logs of the same standard output – NikSp Jun 09 '20 at 21:15
  • Databricks produces this problem...And I really don't understand why...As you mention we should probably check for the possibility of Databricks mainteining the state of the handler. If workers are the problem then every time the cluster scales up to more than 1 worker the problem will be always present, right? – NikSp Jun 09 '20 at 21:19
  • Is there a way to force Databricks to only spawn 1 worker? If the problem is fixed with this, we can possibly assume that the multiple workers spawned are instantiating log handler multiple times. I am leaning towards more of Databricks keeping state somehow though (multiple workers SHOULD not be a problem). – lzchen Jun 09 '20 at 23:40
  • Indeed with only 1 worker + the driver alive on the cluster my prints are logged only 1 time. So I would agree that the multiple workers are somehow the problem to the multiple logging in the Application Insights. However, you can understand that executing workloads with only 1 worker is not feasible in production environments. – NikSp Jun 10 '20 at 06:06
  • Izchen have you figured out any solution to this? A workaround to both use multiple workers and avoid printing the same stdout multiple times. – NikSp Jun 22 '20 at 16:33
  • 1
    @NikSp I can't seem to replicate the issue using Databricks, it's possible it is a different problem. I would advise you to file an issue by going to your support blade on your Application Insights resource. – lzchen Jun 23 '20 at 17:23
  • Ok then, I will open an issue on Azure Application Insights Service regarding the opencensus library and its behavior on Databricks. – NikSp Jun 24 '20 at 06:49