2

My case is the following. I want to launch a cluster during working hours and terminate it after 18:00 and weekends. The clusters will be used for a datascience project. Years ago we would use a boring crontab for this, but these days i prefer to do this with a lambda function.

In boto3 i can launch a cluster (thanks to Jose Quinteiro) and this post describes it very well How to launch and configure an EMR cluster using boto

How can i terminate a cluster in boto3 in the same lambda function as where i start it?

JanBennk
  • 277
  • 7
  • 16

3 Answers3

2

You can terminate the cluster using boto3 by using

emr_client = boto3.client('emr') emr_client.terminate_job_flows(JobFlowIds=[#replace it with cluster Id you want it to close ])

  • This post is quite old but to jump in. We now have a single lambda scheduled by cloudwatch that kills the EMR cluster after working ours. Works like a charm :) – JanBennk Oct 26 '20 at 13:29
1

You could create a scheduled event in cloudwatch that triggers the lambda you are using.

Scheduled events use Cron expressions so you will be able to apply the same logic. Once your function is triggered you will need to determine that it is a shutdown trigger from the event input.

Stephen
  • 3,607
  • 1
  • 27
  • 30
1

Using AWS CloudWatch event/rule and AWS Lambda function to check for Idle EMR clusters, you complete your goal. You achieve visibility on the AWS Console level and can easily enable and disable it.

Keeping in mind the need for this, I have developed a small framework to achieve that using the 2nd solution mentioned above. This framework is an AWS based solution using AWS CloudWatch and AWS Lambda using a Python script that is using Boto3 to terminate AWS EMR clusters that have been idle for a specified period of time.

You specify the maximum idle time threshold and AWS CloudWatch event/rule triggers an AWS Lambda function that queries all AWS EMR clusters in WAITING state and for each, compares the current time with AWS EMR cluster's ready time in case of no EMR steps added so far or compares the current time with AWS EMR cluster's last step's end time. If the threshold has been compromised, the AWS EMR will be terminated after removing termination protection if enabled. If not, it will skip that AWS EMR cluster.

AWS CloudWatch event/rule will decide how often AWS Lambda function should check for idle AWS EMR clusters.

You can disable the AWS CloudWatch event/rule at any time to disable this framework in a single click without deleting its AWS CloudFormation stack.

AWS Lambda function is using Python 3.7 as its runtime environment.

In your case, while creating the stack, you can specify your required Cron expression and maximum idle EMR cluster threshold in minutes to achieve this.

You can get the code and use it from GitHub here: https://github.com/abdullahkhawer/auto-terminate-idle-emr

Any contributions, improvements and suggestions to this solution will be highly appreciated. :)

Abdullah Khawer
  • 4,461
  • 4
  • 29
  • 66