0

Environment:
Kubernetes Cluster: EKS
Logging Agent: FluentBit version 1.2
Destination for FluentBit: AWS Kinesis firehose delivery stream
Fluentbit output plugin: amazon-kinesis-firehose-for-fluent-bit

Description:
We have a setup where a FluentBit (deployed as a daemonset) is putting the logs to the firehose delivery stream. There are 4 pods of FluentBit (one per node/ec2 in the EKS cluster) collecting logs and submitting them to the same firehose. We are in a Canada central region, here for firehose we have a limit of 1 MB/s. We were getting multiple Throttling errors from the firehose. The data being sent is not huge, in the CloudWatch, I see apart from some occasional spikes over 1 MB most of the time the consumption is quite low.

I'm really wondering, is this the right setup? To ingest logs from separate FluentBit pods to directly one firehose delivery stream (firehose destination is S3). Because options to control the data outflow rate from FluentBit and the amazon-kinesis-firehose-for-fluent-bit output plugin are very limited. Limitations:

  1. In the output plugin, I can't control the data outflow rate to the firehose.
  2. If I set a limit on the input plugin and Fluentbit service then it squeezes each fluentBit agent's capacity to hold and push

I feel if there would be one aggregator collecting logs from all fluentBit agents and only one point of ingestion to the Kinesis delivery stream it would more easy to control.

What would you suggest?

Akshay Hiremath
  • 950
  • 2
  • 12
  • 34
  • After opening an issue here https://github.com/aws/amazon-kinesis-firehose-for-fluent-bit/issues/119 found that the version of FluenBit output plugin for Firehose that we are using is 1.2. The retry for throttled records is added in the later version of the output plugin. So with the later version this issue should not be prominent but it would need better handling of backpressure and buffering. – Akshay Hiremath Jun 06 '21 at 19:52

1 Answers1

0

if observed throttling is transient in nature then backoff and retry would be best option. But if you see regular throttling then you can submit limit increase request for firehose using limit increase form provided in aws doc - https://docs.aws.amazon.com/firehose/latest/dev/limits.html

Ajinkya
  • 374
  • 1
  • 4
  • The amazon’s firehose output plugin doesn’t seem to retry on failure. It should ideally do but in the earlier versions of this plugin I see there is no retry. Increasing the limit is not a good solution overall traffic is much less than 1 MB/s. The ideal solution is to regulate inflow traffic to the firehose to avoid the microbursts. I’m looking for ways to do that in given constraints of the plugin, fluent bit and env. – Akshay Hiremath May 02 '21 at 18:27