9

I'm planning to write certain jobs in AWS Glue ETL using Pyspark, which I want to get triggered as and when a new file is dropped in an AWS S3 Location, just like we do for triggering AWS Lambda Functions using S3 Events.

But, I see very narrowed down options only, to trigger a Glue ETL script. Any help on this shall be highly appreciated.

Aakash Basu
  • 1,689
  • 7
  • 28
  • 57

1 Answers1

14

The following should work to trigger a Glue job from AWS Lambda. Have the lambda configured to the appropriate S3 bucket, and IAM roles / permissions assigned to AWS Lambda so that lambda can start the AWS Glue job on behalf of the user.

import boto3
print('Loading function')

def lambda_handler(_event, _context):
    glue = boto3.client('glue')
    gluejobname = "YOUR GLUE JOB NAME"

    try:
        runId = glue.start_job_run(JobName=gluejobname)
        status = glue.get_job_run(JobName=gluejobname, RunId=runId['JobRunId'])
        print("Job Status : ", status['JobRun']['JobRunState'])
    except Exception as e:
        print(e)
        raise
lxop
  • 7,596
  • 3
  • 27
  • 42
Yuva
  • 2,831
  • 7
  • 36
  • 60