I'm experiencing some extremely weird behaviour with kafka (on aws) whereby after deploying an application to a k8s namespace, everything works fine as intended. After a redeployment sometimes the following exception starts appearing and never resolves
o.a.k.clients.producer.internals.Sender : [] [Producer clientId=producer-1] Got error produce response with correlation id 136586 on topic-partition my.topic-5, retrying (2147483138 attempts left). Error: NOT_LEADER_OR_FOLLOWER
What makes this even stranger is that the only resolution we've currently found is to destroy the k8s namespace and redeploy all of our applications.
Heres a list of all the things we've tried:
- changing kafka ack mode
- deleting all topics
- change client.dns.lookup
- running application against the same cluster on a local machine (works fine)
- changing topic replication factor (from 1 -> 2)
Any ideas on what might be causing this and what the solution would be?