2

I'm experiencing some extremely weird behaviour with kafka (on aws) whereby after deploying an application to a k8s namespace, everything works fine as intended. After a redeployment sometimes the following exception starts appearing and never resolves

o.a.k.clients.producer.internals.Sender  : [] [Producer clientId=producer-1] Got error produce response with correlation id 136586 on topic-partition my.topic-5, retrying (2147483138 attempts left). Error: NOT_LEADER_OR_FOLLOWER 

What makes this even stranger is that the only resolution we've currently found is to destroy the k8s namespace and redeploy all of our applications.

Heres a list of all the things we've tried:

  • changing kafka ack mode
  • deleting all topics
  • change client.dns.lookup
  • running application against the same cluster on a local machine (works fine)
  • changing topic replication factor (from 1 -> 2)

Any ideas on what might be causing this and what the solution would be?

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
JimmyyW
  • 53
  • 4
  • I'd suggest looking into the k8s egress policies for your cluster/namespace – OneCricketeer Jun 16 '22 at 13:15
  • See if this helps https://stackoverflow.com/questions/61798565/kafka-producer-fails-to-send-messages-with-not-leader-for-partition-exception – teedak8s Jun 16 '22 at 23:46
  • @AkhilJain no luck with that one, looking into the egress policies now – JimmyyW Jun 17 '22 at 07:40
  • Hi @JimmyyW, I'm having same problem. Could you sort out the problem? – Minh Danh Aug 01 '22 at 08:09
  • 1
    @MinhDanh Yeah this seems to have been fixed - we upped the memory limit on the pod, it may have been that the metadata (we have a LOT of topics) returned from the cluster was too large (pod/jvm memory was set very low), running on trace showed metadata requests failing. Also made sure kafka + https ports were on the k8s egress policies – JimmyyW Aug 02 '22 at 09:09

0 Answers0