I trying to set up a full ELK stack for managing logs from our Kubernetes clusters. Our applications are either logging plain text logs or JSON objects. I want to be able to handle searching in the text logs, and also be able to index and search the fields in the JSON.
I have filebeats running on each Kubernetes node, picking up the docker logs, enriching them with various kubernetes fields, and a few fields we use internally. The complete filebeat.yml
is:
filebeat.inputs:
- type: container
paths:
- /var/log/containers/*.log
processors:
- add_kubernetes_metadata:
host: ${NODE_NAME}
matchers:
- logs_path:
logs_path: "/var/log/containers/"
fields:
kubernetes.cluster: <name of the cluster>
environment: <environment of the cluster>
datacenter: <datacenter the cluster is running in>
fields_under_root: true
output.logstash:
hosts: ["logstash-logstash-headless:5044"]
The filebeat is shipping the resulting logs to a central Logstash I have installed. In the logstash I attempt to parse the log message
field into a new field called message_parsed
. The complete pipeline looks like this:
input {
beats {
port => 5044
type => "beats"
tags => ["beats"]
}
}
filter {
json {
source => "message"
target => "message_parsed"
skip_on_invalid_json => true
}
}
output {
elasticsearch {
hosts => [
"elasticsearch-logging-ingest-headless:9200"
]
}
}
I then have an Elasticsearch cluster installed which received the logs. I have separate Data, Ingest and Master nodes. Apart from some CPU and memory configuration the cluster is completely default settings.
The trouble I'm having is that I do not control the contents of the JSON messages. They could have any field with any type, and we have many cases where the same field exists but the fields values are of differing types. One simple example is the field level
, which is usually a string carrying the values "debug", "info", "warn" or "error", but we also run some software that outputs this level
as a numeric value. Other cases include error
fields sometimes being objects and other times being strings, and date
fields sometimes being unix timestamps and sometimes being human readable dates.
This of course makes Elasticsearch complain with a mapper_parsing_exception
. Here's an example of one such error:
[2021-04-07T15:57:31,200][WARN ][logstash.outputs.elasticsearch][main][19f6c57d0cbe928f269b66714ce77f539d021549b68dc20d8d3668bafe0acd21] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"logstash", :routing=>nil, :_type=>"_doc"}, #<LogStash::Event:0x1211193c>], :response=>{"index"=>{"_index"=>"logstash-2021.04.06-000014", "_type"=>"_doc", "_id"=>"L80NrXgBRfSv8axlknaU", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"object mapping for [message_parsed.error] tried to parse field [error] as object, but found a concrete value"}}}}
Is there any way I can make Elasticsearch handle that case?