1

I have following setup:

FileBeat -> Logstash -> Elasticsearch -> Kibana (all 5.1.1)

When I push the log file (JSON) into Filebeat, and If I try to see it in Kibana Interface, the same logs are added 3-4 time (duplicates). After checking the FileBeat logs I came to know that it may be because Filebeat does not receive an acknowledgement for sent logs and so it keeps on resending. to stop receiving duplicate documents I think I will have to use a document_id in logstash config file.
i.e.

output
{ 
    elasticsearch { 
        document_id => "%{offset}"
        index => "ap-index"
        hosts => ["localhost:9222"]
    }
}

My question is, Is the offset field unique for each document? and Is that a correct way to stop receiving duplicates?

Vishal Kinjavdekar
  • 321
  • 1
  • 5
  • 14

1 Answers1

1

If Filebeat is not receiving acknowledgements from Logstash this is a sign or a problem and you should find the root cause first (there could be congestion in your pipeline).

The offset is not unique if you have more than one log file or do any log rotation. If your log messages contain timestamps then I recommend using a fingerprint filter to generate a hash of the message. Then use the fingerprint hash as the document ID in Elasticsearch.

input {
  beats {
    port => 5044
  }
}

filter {
  fingerprint {
    method => "SHA1"
    key    => "some_random_hmac_key"
    source => ["[beat][hostname]", "offset", "message"]
    concatenate_sources => true
    target => "[@metadata][id]"
  }
}

output {
  elasticsearch {
    hosts => "localhost:9200"
    manage_template => false
    index => "%{[@metadata][beat]}-%{+YYYY.MM.dd}"
    document_type => "%{[@metadata][type]}"
    document_id =>"%{[@metadata][id]}"
  }
}
A J
  • 2,508
  • 21
  • 26
  • I might have more that one log file and yes there can be a log rotation. Also my timestamp may not be unique. In that case hash can be different?? – Vishal Kinjavdekar Apr 13 '17 at 05:03
  • You can modify that hash to incorporate the offset given by Filebeat. If your timestamps are low resolution like in seconds I would do this. I'll update the answer to include the offset. – A J Apr 13 '17 at 19:34