AWS lambda to Confluent Cloud latency issues

Question

I am currently using basic version of cluster on Confluent cloud and I only have one topic with 9 partitions. I have a REST Api that’s setup using AWS lambda service which publishes messages to Kafka. Currently i am stress testing pipeline with 5k-10k requests per second, I found that Latency is shooting up to 20-30 seconds to publish a record of size 1kb. Which is generally 300 ms for a single request. I added producer configurations like linger.ms - 500 ms and batch.size to 100kb. I see some improvement (15-20 seconds per request) but I feel it’s still too high. Is there anything that I am missing or is it something with the basic cluster on confluent cloud? All of the configurations on the cluster were default.

If i consider metrics from Lambda function, it showed that it took 30 seconds for few requests. I think it wouldnt show in lambda metrics if API request got throttled. — icyanide, Jun 04 '21 at 01:38
how do you initiate the REST API request? what SDK or package are you using? AWS SDK applies re-tries with an exponential back-off strategy when the earlier request gets throttled. — Chris Chen, Jun 06 '21 at 00:10
REST API is initiated through user devices whenever a product is being viewed. Lately realized that API request is getting throttled. Instead, If I am using Kafka rest proxy api the performance is lot better. — icyanide, Jun 08 '21 at 01:03
so this problem is solved? the cause is API request was throttled? — Chris Chen, Jun 08 '21 at 05:08
That's correct. Current concurrency limit is 1000, if I am executing 800 events the avg time is less than a second. However when I am triggering 2k+ events, the avg time is shooting up. This could be due to exponential back-off. I requested to increase the service quota. This should solve the problem. Thanks a lot. — icyanide, Jun 09 '21 at 05:50

score 1 · Answer 1 · answered Jun 09 '21 at 05:52

Identified that the issue is with API request which is getting throttled. As mentioned by Chris Chen, due to the exponential back-off strategy by AWS SDK the avg time is shooting up. Requested AWS for increase in concurrent executions. I am sure it should solve the issue.

AWS lambda to Confluent Cloud latency issues

1 Answers1