I'm using Graylog 2 with the Collector Sidecar to collect logfiles from several remote machines. Those logfiles contain Java Stacktraces and Graylog lists every line of these as a seperate message. I tried using the "Enable Multiline" option in the collector configuration and gave it a correct regular expression as start pattern for the merging. Unfortunately this doesn't seem to work.
The mentioned Stacktraces look something like this:
INFO | jvm 1 | 2018/08/07 10:23:30 | 2018-08-07 10:23:30,194 WARN ReportingMessageExtractor - Resource could not be extracted. Setting NO_EXTRACTION String.
INFO | jvm 1 | 2018/08/07 10:23:30 | java.lang.NullPointerException
INFO | jvm 1 | 2018/08/07 10:23:30 | at com.###.###.###.middleware.reporting.ReportingMessageExtractor.extractResource(ReportingMessageExtractor.java:71)
INFO | jvm 1 | 2018/08/07 10:23:30 | at com.###.###.###.middleware.reporting.ReportingMessageExtractor.filledReportingMessage(ReportingMessageExtractor.java:47)
INFO | jvm 1 | 2018/08/07 10:23:30 | at com.###.###.###.middleware.reporting.ReportingMessageExtractor.extract(ReportingMessageExtractor.java:36)
INFO | jvm 1 | 2018/08/07 10:23:30 | at com.###.###.###.middleware.reporting.facade.ReportingMessageFacade.extractReportingMessage(ReportingMessageFacade.java:54)
...
The regular expression I wanted to use to match these messages was this:
.*\|\s*at .*
I tested this expression with regex101.com and it fully matches all lines below the original error message and I know that this specific expression might not be the best. But I also tried at least a dozen other variations that also matched when tested with regex101.
Only this Graylog server doesn't seem to use it correctly because all lines are still listed as separate messages. One of our other Graylog servers correctly merges Stacktraces. The only difference is that the messages in this case are prefixed by the "INFO | jvm 1 | timestamp" block you can see above.
I'll additionally post the generated filebeat.yml below:
filebeat:
prospectors:
- encoding: plain
exclude_files: []
fields:
collector_node_id: DHLAPP-A9011.tcb.deutschepost.de
gl2_source_collector: 892201ae-4d4c-4c01-b994-8adb5999fa5a
type: log
ignore_older: 0
multiline:
match: after
negate: false
pattern: (.*\|\s*at .*)
paths:
- /###/###/###/*.log
scan_frequency: 10s
tail_files: true
type: log
output:
logstash:
hosts:
- localhost:5044
path:
data: /var/cache/graylog/collector-sidecar/filebeat/data
logs: /var/log/graylog/collector-sidecar
tags:
- exampletag
I would be very grateful, if anyone could just give me a hint as to why Multiline won't work with pretested functioning regular expressions in some cases.