Does AWS Kinesis Producer Library aggregate data in memory?

Question

AWS Kinesis Producer Library may be configured to aggregate records before sending to AWS Kinesis Stream. For example we may set:

    final KinesisProducerConfiguration config = new KinesisProducerConfiguration();
    config.setRecordMaxBufferedTime(1000);
    config.setAggregationMaxCount(100);
    config.setRegion("eu-west-1");
    return config;

Is this buffer only in memory or is it saved on the file system? Mostly, I would like to know, if the current buffer is lost when the producer node is restarted.

Without looking at the code, I can almost certainly guarantee it's in memory - persisting it to disk by default would **hugely** impact performance of the library. (That being said, you can easily inspect the code to find out for sure). — Krease, Oct 22 '18 at 22:32
Yes, the Java KPL not only aggregates the records you push before trying to push out to the Kinesis service, but IIRC it holds on to the records until it receives a "Successful" response back from the service. The KPL was designed with its own backoff / retry capabilities built in. The KPL actually has two components to it, the Java library and then a native (C++) component which handles the actual network IO. — Brooks, Mar 13 '19 at 17:34
I don't know exactly how the data is held in memory between the two components, but I do know that you can overwhelm the KPL, which will result in out of memory errors. The key is to slow down your processing... If you shutdown the producer node, than yes its buffer will be lost. To my knowledge, there is no persistence beyond memory. — Brooks, Mar 13 '19 at 17:34
"I would like to know, if the current buffer is lost when the producer node is restarted" - really good question. Surprised there have been no definitive answers for this. — NullPumpkinException, May 13 '22 at 01:20

Does AWS Kinesis Producer Library aggregate data in memory?

0 Answers0