2

I'm using cloudwatch event to trigger a labmda when a new ecs instance starts up.

I'm a bit mystified by the logs, it looks like the function is called then fails silently and then called again and runs successfully with the same event input.

I understand that lambda functions can be called multiple times from cloudwatch for the same event because the delivery contract is at least once and I plan on making changes to the function so it's idempotent.

My question is, what is happening when it's called the first time and just stops running?

More info based on Jarmod's question

The first time the function is called only the following tags are available:

'Tags': [{'Key': 'aws:ec2launchtemplate:id', 'ResourceId': 'i-0a0500abc17bc7e9e', 'ResourceType': 'instance', 'Value': 'lt-01e5585a011e48b97'}]

This means my code was failing on the following line:

if 'torres' in ec2_response['Tags'][0]['Value']

Since when I set the filter to only get the Name tag, the tags were coming back empty the first time. I'm assuming that the lambda was being re-run based on Lambda re-run documentation and the second time it's called the Name tag is there and everything is great.

So my question becomes, why am I not getting an error in my lambda logs?

Here's the log output:

2019-01-01 11:34:46
START RequestId: 63190c9b-0dd2-11e9-b6ed-256f780af4a8 Version: $LATEST
Event: {'version': '0', 'id': '12f0dc8e-2c49-be57-823f-ab6229b12804', 
'detail-type': 'EC2 Instance State-change Notification', 'source': 
'aws.ec2', 'account': '701704546303', 'time': '2019-01-01T14:34:46Z', 
'region': 'us-west-2', 'resources': ['arn:aws:ec2:us-west- 
2:701704546303:instance/i-0a4af714ff4396d55'], 'detail': {'instance-id': 'i-0a4af714ff4396d55', 'state': 'running'}}
, Context: <bootstrap.LambdaContext object at 0x7fc8f40525f8>
Making ec2 name request
END RequestId: 63190c9b-0dd2-11e9-b6ed-256f780af4a8
REPORT RequestId: 63190c9b-0dd2-11e9-b6ed-256f780af4a8 Duration: 366.46 ms  Billed Duration: 400 ms Memory Size: 128 MB Max Memory Used: 39 MB  

2019-01-01 11:35:46
START RequestId: 63190c9b-0dd2-11e9-b6ed-256f780af4a8 Version: $LATEST
Event: {'version': '0', 'id': '12f0dc8e-2c49-be57-823f-ab6229b12804',                     
'detail-type': 'EC2 Instance State-change Notification', 'source': 
'aws.ec2', 'account': '701704546303', 'time': '2019-01-01T14:34:46Z', 
'region': 'us-west-2', 'resources': ['arn:aws:ec2:us-west- 
2:701704546303:instance/i-0a4af714ff4396d55'], 'detail': {'instance- 
id': 'i-0a4af714ff4396d55', 'state': 'running'}}
, Context: <bootstrap.LambdaContext object at 0x7fc8f724ca58>
Making ec2 name request
torres instance, starting ecs task
END RequestId: 63190c9b-0dd2-11e9-b6ed-256f780af4a8
REPORT RequestId: 63190c9b-0dd2-11e9-b6ed-256f780af4a8  Duration: 1022.97 ms    Billed Duration: 1100 ms Memory Size: 128 MB    Max Memory Used: 40 MB  

Here's the function:

def lambda_handler(event, context):
    print('Event: {}\n, Context: {}'.format(event, context))
    if event['detail-type'] != 'EC2 Instance State-change Notification':
        print('Wrong event detail-type')
        ret = {
            'status': 'Wrong event detail-type'
        }
        print(event)
    else:
        print('Making ec2 name request')
        ec2_client = boto3.client('ec2', region_name='us-west-2')
        try:
            ec2_response = ec2_client.describe_tags(
                Filters=[
                    {
                        'Name': 'resource-id',
                        'Values': [
                            event['detail']['instance-id']
                        ]
                    },
                    {
                        'Name': 'key',
                        'Values': [
                            'Name'
                        ]
                    }
                ]
            )
        except Exception as e:
            print('Failed ec2 call')
            print(e)
            raise e
        if 'torres' in ec2_response['Tags'][0]['Value']:
            print('torres instance, starting ecs task')
            ecs_client = boto3.client('ecs', region_name='us-west-2')
            try:
                ecs_response = ecs_client.run_task(
                    cluster='torres',
                    taskDefinition='torres:16'
                )
            except Exception as e:
                print('Failed ecs call')
                print(e)
                raise e
            task_definition_arns = [t['taskDefinitionArn'] for t in ecs_response['tasks']]
            ecs_status = ecs_response['ResponseMetadata']['HTTPStatusCode']

            ret = {
                'task_definition_arns': task_definition_arns,
                'ecs_status': ecs_status
            }
        else:
            print('Wrong instance')
            ret = {
                'status': 'Wrong instance',
                'event': event
            }

    return ret
sshevlyagin
  • 1,310
  • 2
  • 16
  • 26
  • Is that the complete lambda function? I cant find the `making asg lifecycle hook call` output... - And can you make sure, that a Lambda timeout is not the issue? This is usually my first impulse when something in a possible cold-start-situation fails ;-) – Maurice Jan 01 '19 at 23:28
  • What does ec2_response contain in the first case? Does it actually have any valid content with tags? – jarmod Jan 01 '19 at 23:53
  • @Maurice - that was from an earlier version of it, I removed that from the logs it doesn't affect the problem. – sshevlyagin Jan 03 '19 at 00:06
  • @jarmod - you nailed it, the tag isn't there. I'm not sure why the exception didn't make it to the log though. I guess that's the better question now :) – sshevlyagin Jan 03 '19 at 00:07
  • You could include a try/catch around the API calls and log the exception. I'm assuming that the root cause is a timing issue - the ECS resource is launched before the tags are applied (launch and tag are not a single atomic operation presumably) and your Lambda function is being invoked after the launch but before the application of tags (and then the automatic Lambda retry is happening *after* the application of tags, so your Lambda works second time around). – jarmod Jan 03 '19 at 00:13
  • According to https://docs.aws.amazon.com/lambda/latest/dg/python-exceptions.html the error of me looking for a dictionary key that's not there should be showing up in the logs, but it's not. That's the new mystery, which sounds like an AWS issue? I know how I'll work around this, but I'm curious about why the logging isn't working as expected. – sshevlyagin Jan 03 '19 at 00:31
  • 1
    @sshevlyagin This is apparently an [Issue with the Python 3.7 Runtime](https://stackoverflow.com/questions/54002903/aws-lambda-python-3-7-runtime-exception-logging) - switch back to 3.6 and you should see the exception. – Maurice Jan 03 '19 at 10:06

0 Answers0