I am relatively new to the whole of the ELK set up part, hence please bear along.
What I want to do is send the cloudtrail logs that are stored on S3 into a locally hosted (non-AWS I mean) ELK set up. I am not using Filebeat anywhere in the set up. I believe it isn't mandatory to use it. Logstash can directly deliver data to ES.
- Am I right here ?
Once the data is in ES, I would simply want to visualize it in Kibana.
What I have tried so far, given that my ELK is up and running and that there is no Filebeat involved in the setup:
using the S3 logstash plugin
contents of /etc/logstash/conf.d/aws_ct_s3.conf
input {
s3 {
access_key_id => "access_key_id"
bucket => "bucket_name_here"
secret_access_key => "secret_access_key"
prefix => "AWSLogs/<account_number>/CloudTrail/ap-southeast-1/2019/01/09"
sincedb_path => "/tmp/s3ctlogs.sincedb"
region => "us-east-2"
codec => "json"
add_field => { source => gzfiles }
}
}
output {
stdout { codec => json }
elasticsearch {
hosts => ["127.0.0.1:9200"]
index => "attack-%{+YYYY.MM.dd}"
}
}
When logstash is started with the above conf, I can see all working fine. Using the head google chrome plugin, I can see that the documents are continuously getting added to the specified index.In fact when I browse it as well, I can see that there is the data I need. I am able to see the same on the Kibana side too.
The data that each of these gzip files have is of the format:
{
"Records": [
dictionary_D1,
dictionary_D2,
.
.
.
]
}
And I want to have each of these dictionaries from the list of dictionaries above to be a separate event in Kibana. With some Googling around I understand that I could use the split
filter to achieve what I want to. And now my aws_ct_s3.conf
looks something like :
input {
s3 {
access_key_id => "access_key_id"
bucket => "bucket_name_here"
secret_access_key => "secret_access_key"
prefix => "AWSLogs/<account_number>/CloudTrail/ap-southeast-1/2019/01/09"
sincedb_path => "/tmp/s3ctlogs.sincedb"
region => "us-east-2"
codec => "json"
add_field => { source => gzfiles }
}
}
filter {
split {
field => "Records"
}
}
output {
stdout { codec => json }
elasticsearch {
hosts => ["127.0.0.1:9200"]
index => "attack-%{+YYYY.MM.dd}"
}
}
And with this I am in fact getting the data as I need on Kibana.
Now the problem is
Without the filter in place, the number of documents that were being shipped by Logstash from S3 to Elasticsearch was in GBs, while after applying the filter it has stopped at roughly some 5000 documents alone.
I do not know what am I doing wrong here. Could someone please help ?
Current config:
java -XshowSettings:vm => Max Heap Size: 8.9 GB
elasticsearch jvm options => max and min heap size: 6GB
logstash jvm options => max and min heap size: 2GB
ES version - 6.6.0
LS version - 6.6.0
Kibana version - 6.6.0