1

Several months ago, we upgraded to a newer version of EKS in our lower environments. The production environments haven't been upgraded yet. Ever since doing the upgrade, we have observed a performance degradation in our lower environments from some of our Kafka producers. We have not seen any such performance degradation in our production environment and we have made several releases of new code to production during this time. Because of this, we feel fairly confident that the problem is not related to our code, but somehow a configuration issue with either Kafka or Istio.

We have 32 pods, each with 4 Kafka producers, each writing to a unique partition in Kafka. 128 partitions in Kafka total. In production, we have all 128 producers writing to Kafka at approximately the same rate until they all finish at approximately the same time. In the lower environment, we see some producers writing at a similar rate as we see in production, but other producers, including some from the same pod, writing at a much reduced rate.

Here's the graph showing the rate from production, with all producers writing between 10K and 15K per second:

enter image description here

Here's the graph from our lower environment showing some producers writing above 10K per second, but many also writing as low as 5K per second:

enter image description here

I've compared the kafka configs between prod and our performance environment, where we are running these tests, and I can see some slight differences in how the listeners are defined, but I don't know enough about kafka or kubernetes to know if it could be the cause of the problem.

Any help or ideas would be greatly appreciated.

Here's the strimzi operator from our production environment:

https://privnote.com/aVeLrwBF#nZatzZTjT

And, here's the strimzi operator from our performance environment...

https://privnote.com/UH4ReWkb#JSvuD27nF

Here's our kafka yml from production:

https://privnote.com/cB5w5w4R#moAC9BvZx

And here it is from our performance environment:

https://privnote.com/bs67Ld7d#zsZ6JIuiY

Finally, here's our kafka configmap from production:

https://privnote.com/JErf2a99#XWThgvENY

And here it is from the performance environment:

https://privnote.com/6beSBSOa#OCc7GBvjf

0 Answers0