TL;DR: how can I best support multiple multiline patterns in ELK log parsing?
With the popularity of Docker, Python and ELK, one would think that getting properly parsed Python logs out of docker into ELK would be incredibly simple, but I can’t find a satisfactory way to do this.
I have a docker swarm-mode cluster running a mixture of many applications and services, including traefik, nginx, MongoDB, Postgres and a great many Python 3 services. I have filebeat collecting all of the docker logs and shipping them directly to an ES cluster. This works well except for the processing of multiline output.
Python's multiline stack traces typically look like this (in surrounding context):
2018-08-14 06:47:20,390:1:MainThread:INFO:gunicorn.error: Shutting down: Master
2018-08-14 07:15:05,181:9:ThreadPoolExecutor-0_0:ERROR:gunicorn.error: Error handling request /test?text=123&throw=1
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/gunicorn/workers/gthread.py", line 279, in handle
keepalive = self.handle_request(req, conn)
File "/usr/local/lib/python3.6/dist-packages/gunicorn/workers/gthread.py", line 328, in handle_request
respiter = self.wsgi(environ, resp.start_response)
File "/usr/local/lib/python3.6/dist-packages/hug/interface.py", line 100, in __call__
return __hug_internal_self._function(*args, **kwargs)
File "/apps/tester/test.py", line 11, in inner
raise RuntimeError
RuntimeError
2018-08-14 07:15:05,184:9:ThreadPoolExecutor-0_0:INFO:gunicorn.access: 10.255.0.2 - - [14/Aug/2018:07:15:05 +0000] "GET /test?text=123&throw=1 HTTP/1.1" 500 0 "-" "-"
2018-08-14 21:53:20,846:9:MainThread:INFO:gunicorn.error: Worker exiting (pid: 9)
I can parse the Python stack traces correctly with something like this:
multiline.pattern: "^[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2},[0-9]{3}:"
multiline.negate: true
multiline.match: after
This correctly wraps all of the lines that don’t start with the timestamp up to those that do but makes a complete mess of all the non-Python logs.
I can get a reasonable multiline collection from the stack trace with this instead:
multiline.pattern: "^ |\t"
multiline.negate: false
multiline.match: after
but it will miss the last line of the stack trace (which is not indented).
Without a mechanism to allow for an additional pattern to enable (or select from) one of several potential multiline rulesets, it seems that the only way to cope with a mixture of log formats from different services is to not collect from /var/lib/docker/containers/*/*.log
but instead to use a different docker logging driver for the Python logs. I can add logstash into the mix and use that via a different docker driver for the Python logs. This works ok(-ish) in that is does allow me to have a different multiline pattern but has two drawbacks:
1) the template and fields are completely different for logstash and filebeat.
2) this gives me 2 multiline patterns but if I need 3 or more it isn’t really scalable to add more logstash processes on more ports.
There must be a better way to support multiple multiline patterns.
Ideas:
1) I could use another prospector collecting from a different log location for the Python containers but I can't find a way to make docker's log driver use a different location for some containers.
2) I could use labels to discriminate different types of log format and thus multiline requirements but filebeat doesn't seem to be able to filter on labels.