0

I have Kafka cluster of 3 nodes. I am using kafkacat to list data from Kafka. I configure PLAINTEXT and VPN_PLAINTEXT listeners:

listeners=PLAINTEXT://0.0.0.0:6667,VPN_PLAINTEXT://0.0.0.0:6669
advertised.listeners=PLAINTEXT://hadoop-kafka1-stg.local.company.cloud:6667,VPN_PLAINTEXT://hadoop-kafka1-stg-vip.local.company.cloud:6669
listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL,VPN_PLAINTEXT:PLAINTEXT

We find out, we cannot consume data from node 1 (only) - from topics where partition leader is node 1 with error:

kafkacat -C -b hadoop-kafka1-stg-vip.local.company.cloud:6669 -t <topic-name> -o beginning -e -q -p 11
% ERROR: Topic <topic-name> [11] error: Broker: Not leader for partition

I can see, node 1 is leader for this partition:

Metadata for <topic-name> (from broker 3: hadoop-kafka3-stg-vip.local.company.cloud:6669/3):
 3 brokers:
  broker 2 at hadoop-kafka2-stg-vip.local.company.cloud:6669
  broker 3 at hadoop-kafka3-stg-vip.local.company.cloud:6669 (controller)
  broker 1 at hadoop-kafka1-stg-vip.local.company.cloud:6669
 1 topics:
  topic "<topic-name>" with 12 partitions:
    partition 0, leader 2, replicas: 2,1,3, isrs: 3,2,1
    partition 1, leader 3, replicas: 3,2,1, isrs: 3,2,1
    partition 2, leader 1, replicas: 1,3,2, isrs: 3,2,1
    partition 3, leader 2, replicas: 2,3,1, isrs: 3,2,1
    partition 4, leader 3, replicas: 3,1,2, isrs: 3,2,1
    partition 5, leader 1, replicas: 1,2,3, isrs: 3,2,1
    partition 6, leader 2, replicas: 2,1,3, isrs: 3,2,1
    partition 7, leader 3, replicas: 3,2,1, isrs: 3,2,1
    partition 8, leader 1, replicas: 1,3,2, isrs: 3,2,1
    partition 9, leader 2, replicas: 2,3,1, isrs: 3,2,1
    partition 10, leader 3, replicas: 3,1,2, isrs: 3,2,1
    partition 11, leader 1, replicas: 1,2,3, isrs: 3,2,1

I thought the data on node could be corrupted, so I remove everything from data directory kafka_data_dir for Kafka. When I start the daemon, I could see it syncing. After that, the issue persists. There is nothing suspicious in logs.

Could anybody describ and help to find out where is the root cause? Only node number 1 encounter this issue. When I ask the same node on port 6667, it works smoothly.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
dorinand
  • 1,397
  • 1
  • 24
  • 49
  • Have you tried providing all the brokers or one of the others as your bootstrap? It doesn't matter who the leader is for the CLI command. That being said, something with your VPN is likely stopping the traffic – OneCricketeer Apr 25 '22 at 14:07
  • It does not matter which node I ask. When I ask node 1 for partition which leader is node 2 or 3, it works. That means, the first messages between kafkacat and kafka node 1 works fine. The pattern that does not work is when kafka node 1 is leader of partition I asked for – dorinand Apr 25 '22 at 15:01
  • If you force kafka1 to become the controller, then what happens? – OneCricketeer Apr 25 '22 at 16:38

1 Answers1

0

After deeper investigation of traffic with tcpdump I find out that the Kafka configuration was without any problem. When I asked node1 for topic partition, tcpdump on node1 did not catch any packets. Requests has been sent to node3. Requests should be forwarded based on DNS to the right Kafka nodes over Citrix, but the configuration was wrong:

  • hadoop-kafka1-stg-vip.local.company.cloud -> node 3
  • hadoop-kafka2-stg-vip.local.company.cloud -> node 2
  • hadoop-kafka3-stg-vip.local.company.cloud -> node 3

That's the reason, why requests for partition where node1 is not leader works, and when asked for partition where node1 was leader failed with message Broker: Not leader for partition because it was always forwared to node3 by Citrix.

dorinand
  • 1,397
  • 1
  • 24
  • 49