1

Scenario:

You write a program in R or Python, which needs to run on Linux or Windows, you want to log (JSON structured and unstructured) std-out and (mostly unstructured) std-error from this program to a Fluentd instance. Adding a new program or starting another instance should not require to update the Fluentd configuration and the applications will not (yet) be running in a docker environment.

Question:

How to send "logs" from a bunch of programs to an fluentd instance, without the need to perform curl calls for every log entry that your application was originally writing to std-out?

When a UDP or TCP connection' is necessary for the application to run, it seems to become less easy to debug, and any dependency of your program that returns std-out will be required to be parsed, just to get it's logging passed through.

Thoughts:

Alternatively, a question could be, how to accept a 'connection' object which can either point to a file or to a TCP connection? So that switching between the std-out or a TCP destination is a matter of changing a single value?

I like the 'tail' input plugin, which could be what I am looking for, but then:

  1. the original log file never appears to stop growing (will the trail position value reset when it is simply removed? I couldn't find this behaviour), and
  2. it seems that it requires to reconfigure fluentd for every new program that you start on that server (if it logs in another file), I would highly prefer to keep that configuration on the program side...

I build an EFK stack with a docker logdriver set to fluentd, which does not seem to have an optimal solid solution either, but without docker, I already get kind of stuck with setting up a basic configuration (not referring to fluent.conf here).

Community
  • 1
  • 1
Siete
  • 328
  • 3
  • 14

1 Answers1

0

TL;DR

  • std-out -> fluentd: Redirect the program output, when launching your program, to a file. On linux, use logrotate, you will love it.
  • Windows: use fluent-bit.
  • App side config: use single (or predictable) log locations, and the fluentd/fluent-bit 'in_tail' plugin.

logging general:

It's recommended to always write application output to a file, if the std-out must be written to a file, pipe it's output at program startup. For more flexibility for the fluentd configuration, pipe them to separate files (just like 'Apache' does):

My_program.exe Do some crazy stuf > my_out_file.txt 2> my_error_file.txt

This opens the option for fluentd to read from this/these file(s).

Windows:

For Windows systems, use fluent-bit, it likely solves the issue for aggregating the Windows OS program logs. Support for Windows has just been implemented recently.

fluent-bit supports:

  1. the 'tail' plugin, which records the 'inode' value (unique, renaming insensitive, file pointer) and the 'index' (called 'pos' for the full-blown 'fluent' application) value in a sqllite3 database and deals with un-processable data, which is allocated to a certain key ('log' by default)
  2. Works on Windows machines, but note that it cannot buffer to disk, so be sure a lost connection, or another issue with the output, is reestablished or fixed in time so that you will not be running into OOM issues.

Appl. side config:

The tail plugin can monitor a folder, this makes it practically possible to keep the configuration on the side of your program. Just make sure you write your logs of your different applications to a predictable directory.

Fluent-bit setup/config:

For Linux, just use fluentd (unless > 100000 messages per second are required, which is where fluent-bit becomes your only choice).

For Windows, install Fluent-bit, and make it run as a deamon (almost funny sollution).

There are 2 execution methods:

  1. Providing configuration directly via the commandline
  2. Using a config file (example included in zip), and referring to it with the -c flag.

Directly from commandline

Some example executions (without making use of the option to work with a configuration file) can be found here:

PS .\bin\fluent-bit.exe -i winlog -p "channels=Setup,Windows PowerShell" -p "db=./test.db" -o stdout -m '*'

-i declares the input method. Currently, only a few plugins have been implemented, see the man page below.

PS fluent-bit.exe --help

Available Options
  -b  --storage_path=PATH       specify a storage buffering path
  -c  --config=FILE     specify an optional configuration file
  -f, --flush=SECONDS   flush timeout in seconds (default: 5)
  -F  --filter=FILTER    set a filter
  -i, --input=INPUT     set an input
  -m, --match=MATCH     set plugin match, same as '-p match=abc'
  -o, --output=OUTPUT   set an output
  -p, --prop="A=B"      set plugin configuration property
  -R, --parser=FILE     specify a parser configuration file
  -e, --plugin=FILE     load an external plugin (shared lib)
  -l, --log_file=FILE   write log info to a file
  -t, --tag=TAG         set plugin tag, same as '-p tag=abc'
  -T, --sp-task=SQL     define a stream processor task
  -v, --verbose         increase logging verbosity (default: info)
  -s, --coro_stack_size Set coroutines stack size in bytes (default: 98302)
  -q, --quiet           quiet mode
  -S, --sosreport       support report for Enterprise customers
  -V, --version         show version number
  -h, --help            print this help

Inputs
  tail                  Tail files
  dummy                 Generate dummy data
  statsd                StatsD input plugin
  winlog                Windows Event Log
  tcp                   TCP
  forward               Fluentd in-forward
  random                Random

Outputs
  counter               Records counter
  datadog               Send events to DataDog HTTP Event Collector
  es                    Elasticsearch
  file                  Generate log file
  forward               Forward (Fluentd protocol)
  http                  HTTP Output
  influxdb              InfluxDB Time Series
  null                  Throws away events
  slack                 Send events to a Slack channel
  splunk                Send events to Splunk HTTP Event Collector
  stackdriver           Send events to Google Stackdriver Logging
  stdout                Prints events to STDOUT
  tcp                   TCP Output
  flowcounter           FlowCounter

Filters
  aws                   Add AWS Metadata
  expect                Validate expected keys and values
  record_modifier       modify record
  rewrite_tag           Rewrite records tags
  throttle              Throttle messages using sliding window algorithm
  grep                  grep events by specified field values
  kubernetes            Filter to append Kubernetes metadata
  parser                Parse events
  nest                  nest events by specified field values
  modify                modify records by applying rules
  lua                   Lua Scripting Filter
  stdout                Filter events to STDOUT
Community
  • 1
  • 1
Siete
  • 328
  • 3
  • 14