I know the below information is not enough to trace the issue but still, I want some solution.
We have Amazon EKS cluster.
Currently, we are facing the reachability of the Kafka pod issue.
Environment:
- Total 10 nodes with Availability zone ap-south-1a,1b
- I have a three replica of the Kafka cluster (Helm chart installation)
- I have a three replica of the zookeeper (Helm chart installation)
- Kafka using external advertised listener on port 19092
- Kafka has service with an internal network load balancer
- I have deployed a test-pod to check reachability of Kafka pod.
- we are using cloud-map based DNS for advertized listener
Working:
- When I run telnet command from ec2 like
telnet 10.0.1.45 19092
. It works as expected. IP10.0.1.45
is a loadbalancer ip. - When I run telnet command from ec2 like
telnet 10.0.1.69 31899
. It works as expected. IP10.0.1.69
is a actual node's ip and 31899 is nodeport.
Problem:
- When I run same command from test-pod. like
telnet 10.0.1.45 19092
. It works sometime and sometime it will gives an error liketelnet: Unable to connect to remote host: Connection timed out
The issue is something related to kube-proxy. we need help to resolve this issue.
Can anyone help to guide me? Can I restart kube-proxy? Does it affect other pods/deployments?