12

I am trying to access the AWS ETL Glue job id from the script of that job. This is the RunID that you can see in the first column in the AWS Glue Console, something like jr_5fc6d4ecf0248150067f2. How do I get it programmatically with pyspark?

Zeitgeist
  • 1,382
  • 2
  • 16
  • 26

2 Answers2

34

As it's documented in https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-get-resolved-options.html, it's passed in as a command line argument to the Glue Job. You can access the JOB_RUN_ID and other default/reserved or custom job parameters using getResolvedOptions() function.

import sys
from awsglue.utils import getResolvedOptions

args = getResolvedOptions(sys.argv)
job_run_id = args['JOB_RUN_ID']

NOTE: JOB_RUN_ID is a default identity parameter, we don't need to include it as part of options (the second argument to getResolvedOptions()) for getting its value during runtime in a Glue Job.

IamAshKS
  • 749
  • 4
  • 14
Brett
  • 1,223
  • 13
  • 8
  • This is the correct way - I was searching for how to do this, and your method works perfectly! – Eric Meadows Jan 23 '19 at 19:32
  • Is there a similar solution for python-shell jobs (not spark ETL) in glue? This does not seem to be working for those! – Adiga Oct 11 '20 at 19:32
  • @adiga any luck finding something for python shell? I'm trying to get the job name so that I can use the same script on multiple jobs, using the job name as a parameter. – mbourgon Oct 20 '20 at 18:29
  • 1
    @mbourgon Well I used a work around. If you can pass 'job_name' as the parameter, you can use 'get_job_runs' api method for glue client in boto3 and get the job_id by filtering 'RUNNING' jobs (assuming there is only one instance of the job running in glue). Apart from job_id, this will give many other info about the job, which if needed you may use to get some stats about the running job, and yes, from within the job itself. – Adiga Oct 23 '20 at 12:42
  • @Adiga did you find a better way to get the id? Has AWS support replied with a supported way? I made a new question specific to pyshell https://stackoverflow.com/questions/71699242/how-to-get-job-id-from-within-the-python-script-using-aws-glue-pyshell – pitchblack408 Mar 31 '22 at 21:19
-1

You can use boto3 SDK for python to access the AWS services

import boto3

def lambda_handler(event, context):
    client = boto3.client('glue')
    client.start_crawler(Name='test_crawler')
    glue = boto3.client(service_name='glue', region_name='us-east-2',
              endpoint_url='https://glue.us-east-2.amazonaws.com')

    myNewJobRun = client.start_job_run(JobName=myJob['Name'])
    print myNewJobRun['JobRunId']
Basanta
  • 55
  • 4
  • 1
    This assumes that I start the job from a lambda function. I was trying to access the job name from within the job script itself, after the job started on a schedule. – Zeitgeist Apr 12 '18 at 15:04