1

I'm trying to implement EFK stack (with Fluent Bit) in my k8s cluster. My log file I would like to parse sometimes is oneline and sometimes multiline:

2022-03-13 13:27:04 [-][-][-][error][craft\db\Connection::open] SQLSTATE[HY000] [2002] php_network_getaddresses: getaddrinfo failed: Name or service not known
2022-03-13 13:27:04 [-][-][-][info][application] $_GET = []

$_POST = []

$_FILES = []

$_COOKIE = [
    '__test1' => 'x'
    '__test2' => 'x2'
]

$_SERVER = [
    '__test3' => 'x3'
    '__test2' => 'x3'
]

When I'm checking captured logs in Kibana I see that all multiline logs are separated into single lines, which is of course not what we want to have. I'm trying to configure a parser in fluent bit config which will interpret multiline log as one entry, unfortunately with no success.

I've tried this:

[PARSER]
    Name        MULTILINE_MATCH
    Format      regex
    Regex       ^\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2} \[-]\[-]\[-]\[(?<level>.*)\]\[(?<where>.*)\] (?<message>[\s\S]*)
    Time_Key    time
    Time_Format %b %d %H:%M:%S

In k8s all fluent bit configurations are stored in config map. So here's my whole configuration of fluent bit (the multiline parser is at the end):

kind: ConfigMap
metadata:
  name: fluent-bit
  namespace: efk
  labels:
    app: fluent-bit
data:
  # Configuration files: server, input, filters and output
  # ======================================================
  fluent-bit.conf: |
    [SERVICE]
        Flush         1
        Log_Level     info
        Daemon        off
        Parsers_File  parsers.conf
        HTTP_Server   On
        HTTP_Listen   0.0.0.0
        HTTP_Port     2020

    @INCLUDE input-kubernetes.conf
    @INCLUDE filter-kubernetes.conf
    @INCLUDE output-elasticsearch.conf

  input-kubernetes.conf: |
    [INPUT]
        Name              tail
        Tag               kube.*
        Path              /var/log/containers/*.log
        Parser            docker
        DB                /var/log/flb_kube.db
        Mem_Buf_Limit     5MB
        Skip_Long_Lines   On
        Refresh_Interval  10

  filter-kubernetes.conf: |
    [FILTER]
        Name                kubernetes
        Match               kube.*
        Kube_URL            https://kubernetes.default.svc:443
        Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
        Kube_Tag_Prefix     kube.var.log.containers.
        Merge_Log           On
        Merge_Log_Key       log_processed
        K8S-Logging.Parser  On
        K8S-Logging.Exclude Off

  output-elasticsearch.conf: |
    [OUTPUT]
        Name            es
        Match           *
        Host            ${FLUENT_ELASTICSEARCH_HOST}
        Port            ${FLUENT_ELASTICSEARCH_PORT}
        Logstash_Format On
        Replace_Dots    On
        Retry_Limit     False

  parsers.conf: |
    [PARSER]
        Name   apache
        Format regex
        Regex  ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
        Time_Key time
        Time_Format %d/%b/%Y:%H:%M:%S %z

    [PARSER]
        Name   apache2
        Format regex
        Regex  ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
        Time_Key time
        Time_Format %d/%b/%Y:%H:%M:%S %z

    [PARSER]
        Name   apache_error
        Format regex
        Regex  ^\[[^ ]* (?<time>[^\]]*)\] \[(?<level>[^\]]*)\](?: \[pid (?<pid>[^\]]*)\])?( \[client (?<client>[^\]]*)\])? (?<message>.*)$

    [PARSER]
        Name   nginx
        Format regex
        Regex ^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
        Time_Key time
        Time_Format %d/%b/%Y:%H:%M:%S %z

    [PARSER]
        Name   json
        Format json
        Time_Key time
        Time_Format %d/%b/%Y:%H:%M:%S %z

    [PARSER]
        Name        docker
        Format      json
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L
        Time_Keep   On

    [PARSER]
        Name        syslog
        Format      regex
        Regex       ^\<(?<pri>[0-9]+)\>(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$
        Time_Key    time
        Time_Format %b %d %H:%M:%S

    [PARSER]
        Name        MULTILINE_MATCH
        Format      regex
        Regex       ^\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2} \[-]\[-]\[-]\[(?<level>.*)\]\[(?<where>.*)\] (?<message>[\s\S]*)
        Time_Key    time
        Time_Format %b %d %H:%M:%S
Murakami
  • 3,474
  • 7
  • 35
  • 89
  • Which version of Kubernetes did you use and how did you set up the cluster? Did you use bare metal installation or some cloud provider? It is important to reproduce your problem. – Mikołaj Głodziak Mar 14 '22 at 15:08

1 Answers1

1

Starting from Fluent Bit v1.8, You can use the multiline.parser option as below. docker and cri multiline parsers are predefined in fluent-bit.

[INPUT]
        Name tail
        Path /var/log/containers/*.log
        multiline.parser docker, cri
        Tag kube.*
        Mem_Buf_Limit 5MB
        Skip_Long_Lines On

https://docs.fluentbit.io/manual/pipeline/inputs/tail#multiline-and-containers-v1.8

Kavinda Gayashan
  • 387
  • 4
  • 17