2

The problem is to push json logs collected by Filebeat to Elasticsearch with defined _type and _id. Default elastic _type is "log" and _id is smth. like "AVryuUKMKNQ7xhVUFxN2".

My log row:

 {"unit_id":10001,"node_id":1,"message":"Msg ..."}

Desired record in Elasticsearch:

"hits" : [ {
    "_index" : "filebeat",
    "_type" : "unit_id",
    "_id" : "10001",
    ...
    "_source" : {        
        "message" : "Msg ...",
        "node_id" : 1,
        ...
    }
} ]

I know how to do it with Logstash, just use document_id => "%{unit_id}" and document_type => "unit_id" in the output section. The goal is to use only Filebeat. Because it is a very-light weight solution and no intermediate aggregation is needed here.

Dmitry
  • 846
  • 1
  • 7
  • 20

2 Answers2

2

You can set a custom _type by using the document_type option in Filebeat. There is no way to set the _id directly in Filebeat as of version 5.x.

filebeat.prospectors:
- paths: ['/var/log/messages']
  document_type: syslog

You could use the Elasticsearch Ingest Node feature to set the _id field. You would need to use a script processor to copy a value from the event into the _id field. Once you have defined your pipeline you would tell Filebeat to send its data to that pipeline using the output.elasticsearch.pipeline config option.

A J
  • 2,508
  • 21
  • 26
  • I find this very interesting and very baffling....the _bulk api makes it very simple to set the _id field to whatever you want, and Filebeat uses the native _bulk api to send the data over to es....why on earth isn't it a simple configuration setting then in Filebeat.yml to set it against a scheme just like the index pattern....ala { "index": { "index": [value from config], "_id": [also value from config]}}. weird – Justin Jun 05 '19 at 17:37
1

You can now set a custom _id : https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-deduplication.html

Morti
  • 605
  • 5
  • 19