0

We have a large number of EC2 instances running in AWS for about 1 year. Now we are trying to perform a clean up activity for he unused instances and based on a username using which we have launched instances.

I have tried downloading the cloudtrail logs from S3 Bucket and tried filtering the username and 'RunInstances' event so that i may find the User who launched the instance and also the instance details.

The following is the script i used to download all the cloudtrail logs into a single folder, unzip them and filter the instances by 'RunInstances' event and give a count of instances.

I need help on retrieving the usernames from each log wiht 'RunInstances' event and stopping the instances.

My script:

#!bin/sh

s3url="s3://S3bucket/AWSCloudtrailLogs/<accountno>/CloudTrail/region/2016/" 

for (( i=1; i<=12; i++ ))
do
   for (( j=1; j<=31; j++ ))
   do
        if [[ $i -le 9 && $j -le 9 ]]
        then
           aws s3 cp $s3url/0$i/0$j/ ~/test/ --recursive
        elif [[ $i -le 9 && $j -ge 10 ]]
        then
           aws s3 cp $s3url/0$i/$j/ ~/test/ --recursive
        elif [[ $i -ge 10 && $j -le 9 ]]
        then
           aws s3 cp $s3url/$i/0$j/ ~/test/ --recursive
        elif [[ $i -ge 10 && $j -ge 10 ]]
        then
           aws s3 cp $s3url/$i/$j/ ~/test/ --recursive
        fi
   done
done

for v in `ls ~/test/` ; do gunzip $v ; done

for v in `ls ~/test/` ; do cat ~/test/$v | grep RunInstances >> ~/test/result.txt; done

grep -o 'RunInstances' ~/test/result.txt | wc -l

Is there anyway i can do it without downloading the zip files and directly get info from s3 bucket itself? Because this is taking a lot of time as we are having about over 1 million log files.

I need a way to figure this out with any programming language or script.

Thanks for your support.

Ali
  • 955
  • 9
  • 14
  • 1
    Are you running this on a EC2 instance or on your PC? Connection to S3 will be a lot faster from a EC2 instance. Since you need to unzip i dont really see an alternative to downloading, except maybe https://github.com/s3fs-fuse/s3fs-fuse – at0mzk Aug 24 '16 at 08:01

2 Answers2

1

What do you mean by directly get info from s3 bucket itself? S3 is a storage resource not a compute resource. You can avoid downloading to a disk file. Instead you can process it in memory without saving to a file but still have to download it.

Suggestions:

  • Don't download all trails for all regions for the entire year. It will take a looong time
  • Process one month data for a region at a time. Repeat it for other months/regions
  • Use Python/Boto3 which has many handy features for processing trail logs and extract the info you want

I do this every day (only for the previous day logs) but I just can't give the code.

helloV
  • 50,176
  • 7
  • 137
  • 145
0

Instead of downloading all s3 logs and then querying it why not use something like Athena!. It will save you time and reduce your efforts considerably. Cloudtrail does provide sufficient information about who launched the instance and its sdk is available; you could write a python script using boto3 and maybe set up a cron job to run this everyday.

Brian
  • 23
  • 4