4

The Problem

I have a Fluent Bit service (running in a docker container) that needs to tail log files (mounted from the host into the container) and then forward those logs to Elasticsearch. For this PoC I create a new log file every minute (eg. spring-boot-logger-2021-05-25_10_53.0.log, spring-boot-logger-2021-05-25_10_54.0.log etc)

I can see that Fluent Bit picks up all the files, but it only reads and forwards the first few lines of a file (Each log entry is a single line and formated in JSON). Only when the Fluent Bit container is restarted does it read and forward the rest of the files.

To demonstrate this issue, I have a script that generates 200 log entries over a period of 100 seconds (ie. 2 logs per second). After running this script, I get a small number of entries in Elastic as shown in this image. Here one can see that there are only 72 entries with large gaps between the entries.

Once I restart the Fluent Bit container it processes the rest of the files and fill in all the logs as show in this image.

Here is my Fluent Bit config file:

[SERVICE]
    Flush     5
    Daemon    off
    Log_Level debug
    Parsers_File   /fluent-bit/etc/parsers.conf

[INPUT]
    Name  tail
    Parser docker
    Path  /var/log/serviceA/*.log
    Tag   service.A
    DB    /var/db/ServiceA
    Refresh_Interval 30

[INPUT]
    Name  tail
    Parser docker
    Path  /var/log/serviceB/*.log
    Tag   service.B
    DB    /var/db/ServiceB
    Refresh_Interval 30

[OUTPUT]
    Name  stdout
    Match service.*

[OUTPUT]
    Name  es
    Host  es01
    Port  9200
    Logstash_Format On
    tls   Off
    Match service.*

What I've tried I've tried the following:

  • Increased the flush rate to 1s
  • Shortened the refresh_interval to 10s
  • Decreased the Buffer_Chunk_Size & Buffer_Max_Size to 1k with the hope that it will force Fluent Bit to flush the logs more often.
  • Increased Buffer_Chunk_Size & Buffer_Max_Size to 1M as I've read an article stating that Fluent Bit's "pause" callback does not work as expected.
  • Explicitly configured a Mem_Buf_Limit of 5M.
  • Tried Fluent Bit version 1.7, 1.6 and 1.5

I've also used the debug versions of these containers to confirm that the files mounted correctly into the container and that they reflect all the logs (when Fluent Bit does not pick it up)

Fluent Bit's log level has also been set to debug, but there are no hints or errors in the logs.

Has anybody else experienced this issue?

Dewald
  • 41
  • 1
  • 2
  • Exactly same issue here. Tried 1.7 and also 1.8. It fetches new files on PV when they roll and tails very few lines from start (lines that exist on the time of tail) then it stops tailing those files. So it has access to PV. No errors on FluentBit debug logs. – Taner Aug 06 '21 at 10:59
  • I have the same issue, please help if you found the solution, thanks- https://stackoverflow.com/q/68712337/2704051 – Sunil Aug 13 '21 at 08:47
  • Having the same issue – Dam1tha Sep 14 '21 at 03:37
  • Which OS are you trying? – Furkan ÇELİKCİ May 25 '22 at 09:59

2 Answers2

0

I am not sure but I think Fluentbit (1.7 and 1.8) has bug(s) to access on shared logs in PV. It has rights, sees files but not fetches the log lines after its first fetch.

I found the solution by placed the Fluentbit as a sidecar, not a seperated pod.

Taner
  • 4,511
  • 3
  • 18
  • 14
0

I have had the same issue with fluent-bit on Openshift using glusterfs for persistent volumes.

My workaround has been to fork the official repo and build a new fluent-bit Docker image after making a small addition to the Dockerfile:

RUN cmake ... \
    ... \
    -DFLB_INOTIFY=Off \
    ..

However, in the meantime, I see that there is now a configuration parameter called Inotify_Watcher in the tail input documentation, which I guess can be used for exactly this.