0

I have written a Python (pyspark) library which I am using in my AWS Glue scripts. The Python library logs using the usual method of import logger; log = logging.getLogger(__name__); log.info(message).

I would like these logs to appear in Cloudwatch when my glue job runs. How can I route these Python logs to Cloudwatch?

RobinL
  • 11,009
  • 8
  • 48
  • 68

1 Answers1

3

You need to set up a Cloudwatch handler on the python logger that sends logs to Cloudwatch.

One way to do this is using the Cloudwatch handler provided by watchtower, which you import into Glue as a zip file in the usual manner.


import logging

from watchtower import CloudWatchLogHandler

from awsglue.utils import getResolvedOptions
args = getResolvedOptions(sys.argv, ['JOB_NAME'])

# This will be used to make sure the log stream is prefixed with the job ID
# meaning it appears when you click 'see logs' in the Glue web UI.
job_run_id = args['JOB_RUN_ID']
lsn = f"{job_run_id}_custom"

# Need to set your default region so that boto3 create a log client in the right region
os.environ["AWS_DEFAULT_REGION"] = "eu-west-1"

cw = CloudWatchLogHandler(log_group="/aws-glue/jobs/logs-v2", stream_name=lsn)

# Demo of how to route logs made by requests (via urllib3) to cloudwatch
import requests
rlog = logging.getLogger("urllib3")
rlog.setLevel(logging.DEBUG)

rlog.handlers = []
rlog.addHandler(cw)

r  = requests.get('https://www.bbc.co.uk/news')

# The logs generated by urllib3 will now appear in Cloudwatch
RobinL
  • 11,009
  • 8
  • 48
  • 68