I am using dlt python in one of our ETL pipelines, and has kafka topics to be processed using delta live tables. Since when running the DLT pipelines, we couldnt print any log / status messages, I tried using custom loggin using logging library. Though its not throwing any errors, but we couldnt see the logs? Would appreciate any pointers to implement custom logging in DLT pipelines please. I tried the following, but no logs are displayed in the pipeline console.
import json
import dlt
import time
from pyspark.sql.functions import from_json, to_json, col, lit, coalesce
from pyspark.sql.types import StructType, StructField, StringType, LongType
from pyspark.sql.functions import date_format
import logging
from datetime import datetime
logger = logging.getLogger("cmddb_raw_zone")
# formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger.info('Processing of landing to Raw layer has started {0}'.format(datetime.now))
Am aware of audit logs where we can see logs, but would like to know if its possible to view in the pipeline console where there is a section to display system logs.
My pipeline is running for long time for couple of days, and i would like to investigate or know which cell is causing the time delay. Without custom logging its very difficult to troubleshoot the DLT pipelines.