1

I'm attempting to create an AWS Lambda function that consumes CloudTrail events via an S3 trigger. This function will alert on the deletion of CloudWatch logs. The events:

'eventSource': 'logs.amazonaws.com'

and

'eventName': 'DeleteLogStream'

need to be found together as the same event. I have the data in my event but I am unable to capture and print it.

import boto3
import gzip
import json

SNS_TOPIC = "<SNS TOPIC ARN>"
SNS_SUBJECT = "<SUBJECT>"


s3_client = boto3.client('s3')
sns_client = boto3.client('sns')


def handler(event, context):
    for record in event['Records']:
        bucket = record['s3']['bucket']['name']
        key = record['s3']['object']['key']

    
    # Fetch logs from S3
    s3_object = s3_client.get_object(
        Bucket=bucket,
        Key=key,
    )

    # Extract file and metadata from gzipped S3 object
    with gzip.open(s3_object['Body'], 'rb') as binaryObj:
        binaryContent = binaryObj.read()
    
    # Convert from binary data to text
    raw_logs = binaryContent.decode()
    
    # Change text into a dictionary
    dict_logs = json.loads(raw_logs)
    

    # Make sure json_logs key 'Records' exists
    if 'Records' in dict_logs.keys():
    
        print("Printing Dictionary Content: {} \n\n".format(dict_logs))
        
 if dict_logs['Records'][0]['eventSource'] == 'logs.amazonaws.com' and dict_logs['Records'][0]['eventName'] == 'DeleteLogStream':
   print("Found DeleteLogStream event from logs.amazonaws.com!")
  
        # Print Key-Value pair for each item found
        for key, value in dict_logs['Records'][0].items():
            # Account for values that are also dictionaries
            if isinstance(value, dict):
                print("Parent Key: {}".format(key))
                for k, v in value.items():
                    print("Subdict Key: {}".format(k))
                    print("Subdict Value: {}".format(v))
                continue
            else:
                print("Key: {}".format(key))
                print("Value: {}".format(value))

        
        alert_message = "The following log was found: <extracted log contents here>"
        
        # Publish message to SNS topic
        sns_response = sns_client.publish(
            TopicArn=SNS_TOPIC,
            Message=alert_message,
            Subject=SNS_SUBJECT,
            MessageStructure='string',
        )

    else:
        print("Records key not found")

Here is the result I get: Result from Code

My code prints the Keys/Values for debugging purposes. Any ideas why the 'DeleteLogStream' and 'logs.amazonaws.com' values aren't parsing out?

Sample json event below: https://raw.githubusercontent.com/danielkowalski1/general-scripts/master/sampleevent

Dan
  • 21
  • 1
  • 5
  • Which part of the code is trying to get `DeleteLogStream` and `logs.amazonaws.com`? – Chetan Dec 21 '18 at 03:24
  • @ChetanRanpariya Thank you for your reply. I forgot to add that code line back in. It's right after I "Print Dictionary Content". – Dan Dec 21 '18 at 04:06
  • I've tried to create a dict out of it as well as json, each one has formatting issues. Also, the problem statement is not entirely clear. I believe you're asking why your line, ""Found DeleteLogStream event from logs.amazonaws.com!"" is never printing despite the fact that the keys are in the data – Kevin S Dec 21 '18 at 04:30
  • 1
    You are receiving lot of data in events. The first record in the event has `eventSource` as `s3.amazonaws.com` that's why `if dict_logs['Records'][0]['eventSource'] == 'logs.amazonaws.com'` condition fails. If you parse the JSON using some JSON parser you will notice that there are multiple `logs.amazonaws.com` in eventSource starting from 5th record. So you need to do `if dict_logs['Records'][4]['eventSource'] == 'logs.amazonaws.com'`. – Chetan Dec 21 '18 at 09:57
  • 1
    The amount of data you are receiving in event in LambdaFunction is very big. Are you sure you have configured proper trigger for the Lambda? Looks like you are getting cloudTrail event from all the AWS services from your account. – Chetan Dec 21 '18 at 09:59
  • @KevinS You are correct and I apologize for the confusion. I have edited my original post. Thank you for your response. – Dan Dec 21 '18 at 13:57
  • @ChetanRanpariya That looks to be the problem indeed. I was focused on AWS's CloudTrail-Lambda tutorial which shows their Javascript code sample using only the first ['Records'][0] item in the dictionary. Let me work on this later and I'll share my progress. Thank you for following up on this. – Dan Dec 21 '18 at 15:12
  • Looks like you have not translated js code to python properly. – Chetan Dec 21 '18 at 17:16
  • Hey @ChetanRanpariya I would like to up-vote your hints but am unable to (brand new to SO). Is there a way I can do that? – Dan Dec 22 '18 at 23:44
  • 1
    You can greatly simplify your life by using [`jsonpath`](https://pypi.org/project/jsonpath/). – 9000 Dec 22 '18 at 23:56
  • No worries @Dan... Happy to help... :) – Chetan Dec 23 '18 at 00:57
  • 1
    @9000 Just checked it out. Looks amazing :) – Dan Dec 23 '18 at 01:35

1 Answers1

1

Alright, fixed the problem. This runs through the entire Records list and then sifts through the dictionaries for each list value thus finding all occurrences of 'DeleteLogStream'.

EVENT_SOURCE = "logs.amazonaws.com"
EVENT_NAME = "DeleteLogStream"     

# Make sure 'Records'key exists
    if 'Records' in dict_logs.keys():
        for item in dict_logs['Records']:

            # Trigger only if a log
            if ('eventSource' in item):
                if (item['eventSource'] == EVENT_SOURCE):
                    if (item['eventName'] == EVENT_NAME):
                        # Grab other useful details for investigation
                        if item['sourceIPAddress']:
                            src_ip = item['sourceIPAddress']
                        if item['userIdentity']['arn']:
                            src_user = item['userIdentity']['arn']
Dan
  • 21
  • 1
  • 5