Sending Cloudtrail gzip logs from S3 to ElasticSearch

Question

I am relatively new to the whole of the ELK set up part, hence please bear along.

What I want to do is send the cloudtrail logs that are stored on S3 into a locally hosted (non-AWS I mean) ELK set up. I am not using Filebeat anywhere in the set up. I believe it isn't mandatory to use it. Logstash can directly deliver data to ES.

Am I right here ?

Once the data is in ES, I would simply want to visualize it in Kibana.

What I have tried so far, given that my ELK is up and running and that there is no Filebeat involved in the setup:

using the S3 logstash plugin

contents of /etc/logstash/conf.d/aws_ct_s3.conf

input {
s3 {
access_key_id => "access_key_id"
bucket => "bucket_name_here"
secret_access_key => "secret_access_key"
prefix => "AWSLogs/<account_number>/CloudTrail/ap-southeast-1/2019/01/09"
sincedb_path => "/tmp/s3ctlogs.sincedb"
region => "us-east-2"
codec => "json"
add_field => { source => gzfiles }
}
}

output {
stdout { codec => json }
elasticsearch {
hosts => ["127.0.0.1:9200"]
index => "attack-%{+YYYY.MM.dd}"
}
}

When logstash is started with the above conf, I can see all working fine. Using the head google chrome plugin, I can see that the documents are continuously getting added to the specified index.In fact when I browse it as well, I can see that there is the data I need. I am able to see the same on the Kibana side too.

The data that each of these gzip files have is of the format:

{
  "Records": [
    dictionary_D1,
    dictionary_D2,
    .
    .
    .
  ]
}

And I want to have each of these dictionaries from the list of dictionaries above to be a separate event in Kibana. With some Googling around I understand that I could use the split filter to achieve what I want to. And now my aws_ct_s3.conf looks something like :

input {
s3 {
access_key_id => "access_key_id"
bucket => "bucket_name_here"
secret_access_key => "secret_access_key"
prefix => "AWSLogs/<account_number>/CloudTrail/ap-southeast-1/2019/01/09"
sincedb_path => "/tmp/s3ctlogs.sincedb"
region => "us-east-2"
codec => "json"
add_field => { source => gzfiles }
}
}

filter {
split {
   field => "Records"
 }
}

output {
stdout { codec => json }
elasticsearch {
hosts => ["127.0.0.1:9200"]
index => "attack-%{+YYYY.MM.dd}"
}
}

And with this I am in fact getting the data as I need on Kibana.

Now the problem is

Without the filter in place, the number of documents that were being shipped by Logstash from S3 to Elasticsearch was in GBs, while after applying the filter it has stopped at roughly some 5000 documents alone.

I do not know what am I doing wrong here. Could someone please help ?

Current config:

java -XshowSettings:vm => Max Heap Size: 8.9 GB

elasticsearch jvm options => max and min heap size: 6GB

logstash jvm options => max and min heap size: 2GB

ES version - 6.6.0

LS version - 6.6.0

Kibana version - 6.6.0

This is how the current heap usage looks like:

So the number has now increased from 5k to ~12k docs and it has stagnated at that number now. I did not do any change whatsoever. So I believe it's still sourcing the data, but it's rather slow ! Wonder if that's really the issue and if it is then how can I make it faster ? — qre0ct, Feb 03 '19 at 13:48
If LS or ES is overwhelmed, it will report such in the log files. — Alain Collins, Feb 04 '19 at 20:25

Sending Cloudtrail gzip logs from S3 to ElasticSearch

0 Answers0