I've prepared a simple lambda function in AWS to terminate long running EMR clusters after a certain threshold is reached. This code snippet is tested locally and is working perfectly fine. Now I pushed it into a lambda, took care of the library dependencies, so that's also fine. This lambda is triggered from a CloudWatch rule, which is a simple cron schedule. I'm using an existing IAM rule which has these 7 policies attached to it.
- SecretsManagerReadWrite
- AmazonSQSFullAccess
- AmazonS3FullAccess
- CloudWatchFullAccess
- AWSGlueServiceRole
- AmazonSESFullAccess
- AWSLambdaRole
I've configured the lambda to be inside the same vpc and security group as that of the emr(s). Still I'm getting this error consistently:
An error occurred (AccessDeniedException) when calling the ListClusters operation: User: arn:aws:sts::xyz:assumed-role/dev-lambda-role/terminate_inactive_dev_emr_clusters is not authorized to perform: elasticmapreduce:ListClusters on resource: *: ClientError
Traceback (most recent call last):
File "/var/task/terminate_dev_emr.py", line 24, in terminator
ClusterStates=['STARTING', 'BOOTSTRAPPING', 'RUNNING', 'WAITING']
File "/var/runtime/botocore/client.py", line 314, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/var/runtime/botocore/client.py", line 612, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (AccessDeniedException) when calling the ListClusters operation: User: arn:aws:sts::xyz:assumed-role/dev-lambda-role/terminate_inactive_dev_emr_clusters is not authorized to perform: elasticmapreduce:ListClusters on resource: *
My lambda function looks something like this:
import pytz
import boto3
from datetime import datetime, timedelta
def terminator(event, context):
''' cluster lifetime limit in hours '''
LIMIT = 7
TIMEZONE = 'Asia/Kolkata'
AWS_REGION = 'eu-west-1'
print('Start cluster check')
emr = boto3.client('emr', region_name=AWS_REGION)
local_tz = pytz.timezone(TIMEZONE)
today = local_tz.localize(datetime.today(), is_dst=None)
lifetimelimit = today - timedelta(hours=LIMIT)
clusters = emr.list_clusters(
CreatedBefore=lifetimelimit,
ClusterStates=['STARTING', 'BOOTSTRAPPING', 'RUNNING', 'WAITING']
)
if clusters['Clusters'] is not None:
for cluster in clusters['Clusters']:
description = emr.describe_cluster(ClusterId=cluster['Id'])
if(len(description['Cluster']['Tags']) == 1
and description['Cluster']['Tags'][0]['Key'] == 'dev.ephemeral'):
print('Terminating Cluster: [{id}] with name [{name}]. It was active since: [{time}]'.format(id=cluster['Id'], name=cluster['Name'], time=cluster['Status']['Timeline']['CreationDateTime'].strftime('%Y-%m-%d %H:%M:%S')))
emr.terminate_job_flows(JobFlowIds=[cluster['Id']])
print('cluster check done')
return
Any help is appreciated.