0

I'm trying to write a logging system in databricks for a few jobs we need to run. Currently I'm setting up a logger and log the files in-memory -> log_stream = io.StringIO()

All functions are covered in a try, except block to catch info or exception in the logger and log them. But it is also used to have certainty that the last block of the notebook will run. Which is needed because this contains the code that uploads the in-memory file to a blob storage.

However, I feel this method is quite 'ugly', since every code needs to be covered in a try, except block.

Are their any methods to either always run the last block of the notebook even when an part of the code completly fails/errors. Or is there another method to secure that the logfile is directly uploaded in case of any errors?

current code:
-- logging --

log_stream = io.StringIO()

logger = logging.getLogger(database_name_bron)
logger.setLevel(logging.DEBUG)

formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')

handler = logging.StreamHandler(log_stream)
handler.setLevel(logging.DEBUG)
handler.setFormatter(formatter)
if logger.hasHandlers():
    logger.handlers.clear()
logger.addHandler(handler)

-- code block example --

try:
    table = output_dict['*'].select( \
        col('1*').alias('1*'), \
        col('2*').alias('2*'), \
        col('3*').alias('3*'), \
        col('4*').alias('4*'), \
        col('5*').alias('5*'), \
            )

    #join tabellen
    table2= table2.join(table1, table2.5* == table1.4*, 'left')
    logger.info('left join van table 1en table2')
except Exception as e:
    logger.exception(f"Een error heeft plaatsgevonden tijdens het joinen van table1 en table2: {e}")

-- upload block --

#extraheer log data
log_content = log_stream.getvalue()

#upload data naar de blob storage
dbutils.fs.put(f"abfss://{container_name}@{storage_account}.dfs.core.windows.net/{p_container_name}", log_content, overwrite=True)

#netjes afsluiten van de handler
logger.removeHandler(handler)
handler.close()
Aswin
  • 4,090
  • 2
  • 4
  • 16
  • are you using azure databricks? – DileeprajnarayanThumula Jul 04 '23 at 16:40
  • yes, you are correct – Mitchell Jul 05 '23 at 06:47
  • are you asking that your upload block should not be in every try/catch block and should run after all try/catch block even there is error? – JayashankarGS Jul 05 '23 at 11:50
  • almost, the upload block is only at the end of the notebook. So it is there only once. What I would like is to remove the try catch block of every part of code and function to make the code more readable and clean. If this is achieved by using a completly other method of custom logging or and way to always ensure this upload code when the notebook fails does not matter for me. It is mainly that I cannot really find a good way of custom logging of databricks notebooks and have the option to upload this to an azure blob storage. – Mitchell Jul 05 '23 at 15:14

1 Answers1

0

You run your code in custom log context by creating it.

Below is the code.

import logging
import io

class LoggingContext:
    def  __init__(self, logger, storage_account, container_name, p_container_name):
        self.logger = logger
        self.storage_account = storage_account
        self.container_name = container_name
        self.p_container_name = p_container_name
        self.log_stream = io.StringIO()
        
    def __enter__(self):
        handler = logging.StreamHandler(self.log_stream)
        handler.setLevel(logging.DEBUG)
        formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
        handler.setFormatter(formatter)
        self.logger.addHandler(handler)
        return  self.logger
        
    def __exit__(self, exc_type, exc_val, exc_tb):
        self.logger.removeHandler(self.logger.handlers[0])
        handler = self.logger.handlers[0]
        handler.close()
        if exc_type is  not  None:
            self.logger.exception(f"An error occurred: {exc_val}")
        log_content = self.log_stream.getvalue()
        dbutils.fs.put(f"abfss://{self.container_name}@{self.storage_account}.dfs.core.windows.net/{self.p_container_name}",log_content, overwrite=True)

Here, __init__ initiates all required variables in the context. __enter__ configures the log settings like format,level,lo_stream and handler. And __exit__ handles to close handlers, performs log content upload operation to storage.

logger = logging.getLogger("database_name_bron")
logger.setLevel(logging.DEBUG)
storage_account = 'jgsadls'
container_name = 'data'
p_container_name = 'databricks_log'

Add your required information here.

context code;

with LoggingContext(logger, storage_account, container_name, p_container_name):
    logger.info('Log started')
    logger.info("Table joined.Check logs for more info....")
    y = 1/0
    logger.info("Log ended")

Here, add your code blocks in LoggingContext wherever you run as above.

enter image description here

Ouputs:

enter image description here

Whenever you run your code block just run it inside Logcontext instead of try/except.

JayashankarGS
  • 1,501
  • 2
  • 2
  • 6
  • Thanks for the suggestion, but this would only catch the error in the log right, there is no way to add extra info to the logger when the error appears? for example in your example: an error occured during the division by zero task: 'error statement' – Mitchell Jul 07 '23 at 07:57
  • Then you need to use try/except block inside this `with Logcontext`. Because to log extra info when error appears you need to catch the error in except block. – JayashankarGS Jul 07 '23 at 09:51