How to read unescaped json in fluentd?

Question

I have a setup with Fluent Bit sending data in Elasticsearch format to Haproxy in SSL. Haproxy terminates the SSL and forwards the data to Fluentd. Now here is the issue, Fluentd receives the data unescaped and thus can't forward it to ES.

Fluentd receives this data (I added the line break for readability on Stackoverflow):

2020-09-14 11:07:16 +0000 [error]: #0 failed to process request error_class=RuntimeError 
error="Received event is not json: {\"index\":{\"_index\":\"fluent_bit\",\"_type\":\"my_type
\"}}\n{\"@timestamp\":\"2020-09-14T11:07:15.173Z\",\"cpu_p\":3.583333333333333,\"user_p\":2.75,
\"system_p\":0.8333333333333334,\"cpu0.p_cpu\":4,\"cpu0.p_user\":1,\"cpu0.p_system
\":3,\"cpu1.p_cpu\":2,\"cpu1.p_user\":1,\"cpu1.p_system\":1,\"cpu2.p_cpu\":4,\"cpu2.p_user
\":3,\"cpu2.p_system\":1,\"cpu3.p_cpu\":6,\"cpu3.p_user\":4,\"cpu3.p_system\":2,\"cpu4.p_cpu
\":3,\"cpu4.p_user\":3,\"cpu4.p_system\":0,\"cpu5.p_cpu\":6,\"cpu5.p_user\":6,\"cpu5.p_system
\":0,\"cpu6.p_cpu\":4,\"cpu6.p_user\":3,\"cpu6.p_system\":1,\"cpu7.p_cpu\":4,\"cpu7.p_user
\":4,\"cpu7.p_system\":0,\"cpu8.p_cpu\":3,\"cpu8.p_user\":2,\"cpu8.p_system\":1,\"cpu9.p_cpu
\":3,\"cpu9.p_user\":3,\"cpu9.p_system\":0,\"cpu10.p_cpu\":1,\"cpu10.p_user\":0,\"cpu10.p_system
\":1,\"cpu11.p_cpu\":2,\"cpu11.p_user\":2,\"cpu11.p_system\":0}\n"

Multiple notes to add:

I could send from Fluentbit everything in HTTP and it would work, but in this case I lose the timestamp, index and index type.
There must be a parser or filter that simply take the current unescaped json in Fluentd and transform it but I can't find amy in practice. I'm open to any solution, on any stack.

Fluent Bit settings:

[OUTPUT]
    Name             es
    Match            *
    Host             <my-domain>
    Port             443
    Index            fluent_bit
    Type             my_type
    # + TLS settings

Fluentd Settings:

<source>
  @type http
  port 8888
  bind 0.0.0.0
  body_size_limit 32m
  keepalive_timeout 10s
  add_remote_addr true
  format json
</source>

Basic HAProxy backend settings:

backend nodes
  mode          http
  option        forwardfor
  timeout       server 15m
  balance       roundrobin
  server        elastic-us-east-1a ip:port check inter 5000 downinter 500

Max Lobur · Answer 1 · 2020-09-14T16:15:42.913

1

The reason of this behavior is that you are using [OUTPUT] Name es but sending to a fluentd server instead of ES. If you want logs to go thru centralized log forwarder (fleuntd server) before getting to ES, use this:

FluentBit

...
[OUTPUT]
    Name          forward
...

Fluentd

...
<source>
  type forward
  bind 0.0.0.0
  port 24224
</source>

<match **>
 @type elasticsearch
 ...
</match>
...

FluentBit docs on log forwarding: https://docs.fluentbit.io/manual/pipeline/outputs/forward

OR there's an option to send from the FluentBit to ES directly: https://docs.fluentbit.io/manual/pipeline/outputs/elasticsearch

edited Sep 14 '20 at 16:15

answered Sep 14 '20 at 13:37

Max Lobur

5,662
22
35

I have tried. Without HAProxy in the flow, it works fine, but with it, I juste get `invalid helo message` in fluentbit – David Bensoussan Sep 15 '20 at 09:30
Can't help with HAProxy here. You have an option to send thru fluentd log forwarder, and from there load-balance to multiple ES nodes using fluend: https://docs.fluentd.org/output/elasticsearch#hosts-optional, without HAProxy. Unfortunately, fluentbit itself does not support ES loadbalancing https://github.com/fluent/fluent-bit-kubernetes-logging/issues/43. – Max Lobur Sep 15 '20 at 09:38
I'm not trying to send Haproxy logs but just forward the traffic. I use HAProxy for some rules at the moment, e.g forbid a delete, but since there is client authentication, it may be skippable – David Bensoussan Sep 15 '20 at 09:59

How to read unescaped json in fluentd?

1 Answers1