0

I am currently using basic version of cluster on Confluent cloud and I only have one topic with 9 partitions. I have a REST Api that’s setup using AWS lambda service which publishes messages to Kafka. Currently i am stress testing pipeline with 5k-10k requests per second, I found that Latency is shooting up to 20-30 seconds to publish a record of size 1kb. Which is generally 300 ms for a single request. I added producer configurations like linger.ms - 500 ms and batch.size to 100kb. I see some improvement (15-20 seconds per request) but I feel it’s still too high. Is there anything that I am missing or is it something with the basic cluster on confluent cloud? All of the configurations on the cluster were default.

icyanide
  • 31
  • 5
  • It feels like the API request got throttled. – Chris Chen Jun 04 '21 at 01:33
  • If i consider metrics from Lambda function, it showed that it took 30 seconds for few requests. I think it wouldnt show in lambda metrics if API request got throttled. – icyanide Jun 04 '21 at 01:38
  • how do you initiate the REST API request? what SDK or package are you using? AWS SDK applies re-tries with an exponential back-off strategy when the earlier request gets throttled. – Chris Chen Jun 06 '21 at 00:10
  • REST API is initiated through user devices whenever a product is being viewed. Lately realized that API request is getting throttled. Instead, If I am using Kafka rest proxy api the performance is lot better. – icyanide Jun 08 '21 at 01:03
  • so this problem is solved? the cause is API request was throttled? – Chris Chen Jun 08 '21 at 05:08
  • That's correct. Current concurrency limit is 1000, if I am executing 800 events the avg time is less than a second. However when I am triggering 2k+ events, the avg time is shooting up. This could be due to exponential back-off. I requested to increase the service quota. This should solve the problem. Thanks a lot. – icyanide Jun 09 '21 at 05:50

1 Answers1

1

Identified that the issue is with API request which is getting throttled. As mentioned by Chris Chen, due to the exponential back-off strategy by AWS SDK the avg time is shooting up. Requested AWS for increase in concurrent executions. I am sure it should solve the issue.

icyanide
  • 31
  • 5