2

I am struggling to enable DEBUG logging for a Glue script using PySpark only.

I have tried:

import...

def quiet_logs(sc):
    logger = sc._jvm.org.apache.log4j
    logger.LogManager.getLogger("org").setLevel(logger.Level.ERROR)
    logger.LogManager.getLogger("akka").setLevel(logger.Level.ERROR)


def main():

    # Get the Spark Context
    sc = SparkContext.getOrCreate()
    sc.setLogLevel("DEBUG")
    quiet_logs(sc)

    context = GlueContext(sc)
    logger = context.get_logger()

    logger.debug("I only want to see this..., and for all others, only ERRORS")
    ...

I have '--enable-continuous-cloudwatch-log' set to true, but simply cannot get the log trail to only write debug messages for my own script.

Jaco Van Niekerk
  • 4,180
  • 2
  • 21
  • 48

2 Answers2

2

I haven't managed to do exactly what you want, but I was able to do something similar by setting up a separate custom log, and this might achieve what you're after.

import os
from watchtower import CloudWatchLogHandler
import logging


args = getResolvedOptions(sys.argv,["JOB_RUN_ID"])

job_run_id = args["JOB_RUN_ID"]

os.environ["AWS_DEFAULT_REGION"] = "eu-west-1"
lsn = f"{job_run_id}_custom"
cw = CloudWatchLogHandler(
    log_group="/aws-glue/jobs/logs-v2", stream_name=lsn, send_interval=4
)
slog = logging.getLogger()
slog.setLevel(logging.DEBUG)
slog.handlers = []
slog.addHandler(cw)
slog.info("hello from the custom logger")

Now anything you log to slog will go into a separate logger accessible as one of the entries in the 'output' logs

Note you need to include watchtower as a --additional-python-modules when you run the glue job

More info here

RobinL
  • 11,009
  • 8
  • 48
  • 68
0

This should be regular logging setup in Python.

I have tested this in a Glue job and only one debug-message was visible. The job was configured with "Continuous logging" and they ended up in "Output logs"

import logging

# Warning level on root
logging.basicConfig(level=logging.WARNING, format='%(asctime)s [%(levelname)s] [%(name)s] %(message)s')

logger = logging.getLogger(__name__)

# Debug level only for this logger
logger.setLevel(logging.DEBUG)

logger.debug("DEBUG_LOG test")

You can also mute specific loggers:

logging.getLogger('botocore.vendored.requests.packages.urllib3.connectionpool').setLevel(logging.WARN)
selle
  • 868
  • 1
  • 10
  • 27