23

I am using Kafka Server 0.9 with consumer kafka-client version 0.9 and kafka-producer 0.8.2.

Every thing is working great except i am getting lot of info that the coordinator is dead on the consumer

2016-02-25 19:30:45.046  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.048  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.049  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.050  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.051  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.052  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.053  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.054  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.055  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.056  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.057  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.058  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.059  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.060  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.061  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.062  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.062  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.063  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.064  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.065  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.066  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.067  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.068  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.068  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.069  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.070  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.071  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.072  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.072  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.073  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.074  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.075  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.075  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.076  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.077  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.078  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.079  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.079  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.080  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.081  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.082  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.083  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.083  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.084  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.085  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.086  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.086  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.087  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.088  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.089  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.089  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.090  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.091  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.093  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.094  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-02-25 19:30:45.094  INFO 10263 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.

I also noticed that the producer is having disconnect connect every 10 minute as the below

2016-03-12 15:55:36 INFO  [pool-1-thread-1] - Fetching metadata from broker id:0,host:192.168.72.30,port:9092 with correlation id 41675 for 1 topic(s) Set(act)
2016-03-12 15:55:36 INFO  [pool-1-thread-1] - Connected to 192.168.72.30:9092 for producing
2016-03-12 15:55:36 INFO  [pool-1-thread-1] - Disconnecting from 192.168.72.30:9092
2016-03-12 15:55:36 INFO  [pool-1-thread-1] - Disconnecting from kafkauk.XXXXXXXXXX.co:9092
2016-03-12 15:55:36 INFO  [pool-1-thread-1] - Connected to kafkauk.XXXXXXXXXX.co:9092 for producing

this is my producer configuration

metadata.broker.list=192.168.72.30:9092
serializer.class=kafka.serializer.StringEncoder
request.required.acks=1
linger.ms=2000
batch.size=500

and consumer config

bootstrap.servers: kafkauk.xxxxxxxx.co:9092
group.id: cdrServer
client.id: cdrServer
enable.auto.commit: true
auto.commit.interval.ms: 1000
session.timeout.ms: 30000
key.deserializer: org.apache.kafka.common.serialization.StringDeserializer
value.deserializer: org.apache.kafka.common.serialization.StringDeserializer

I could not figure out what does these mean and should i neglect them or i am missing something in the configuration


After i change kafka to debug level on the consumer i found the below

2016-03-13 18:21:55.586 DEBUG 5469 --- [      cdrServer] org.apache.kafka.clients.NetworkClient   : Node 2147483647 disconnected.
2016-03-13 18:21:55.586  INFO 5469 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Marking the coordinator 2147483647 dead.
2016-03-13 18:21:55.586 DEBUG 5469 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Issuing group metadata request to broker 0
2016-03-13 18:21:55.586 DEBUG 5469 --- [      cdrServer] org.apache.kafka.clients.NetworkClient   : Sending metadata request ClientRequest(expectResponse=true
, callback=null, request=RequestSend(header={api_key=3,api_version=0,correlation_id=183025,client_id=cdrServer}, body={topics=[act]}), isInitiatedByNetworkCli
ent, createdTimeMs=1457893315586, sendTimeMs=0) to node 0
2016-03-13 18:21:55.591 DEBUG 5469 --- [      cdrServer] org.apache.kafka.clients.Metadata        : Updated cluster metadata version 296 to Cluster(nodes = [N
ode(0, kafkauk.xxxxxxxxx.co, 9092)], partitions = [Partition(topic = act, partition = 0, leader = 0, replicas = [0,], isr = [0,]])
2016-03-13 18:21:55.592 DEBUG 5469 --- [      cdrServer] o.a.k.c.c.internals.AbstractCoordinator  : Group metadata response ClientResponse(receivedTimeMs=1457
893315592, disconnected=false, request=ClientRequest(expectResponse=true, callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFu
tureCompletionHandler@1e2de777, request=RequestSend(header={api_key=10,api_version=0,correlation_id=183024,client_id=cdrServer}, body={group_id=cdrServer}), c
reatedTimeMs=1457893315586, sendTimeMs=1457893315586), responseBody={error_code=0,coordinator={node_id=0,host=kafkauk.xxxxxxxx.co,port=9092}})

I am not sure it is a network problem because it happen every 9 minute exactly


Update

I found that is directly related to

connections.max.idle.ms: 300000

What ever i put then i will get disconnected at this value

Shahbour
  • 1,323
  • 1
  • 16
  • 42
  • 2
    In my experience, manual partition assignment and external offset management in Kafka, though in theory supported, are difficult and problematic. It seems simple, but a production-stable implementation requires lots of workarounds. –  Jun 28 '17 at 04:11
  • These are info messages only and may not necessarily affect the successful running of Kafka – Chris Halcrow Jun 04 '21 at 07:13

7 Answers7

11

Marking the coordinator dead happens when there is a Network communication error between the Consumer Client and the Coordinator (Also this can happen when the Coordinator dies and the group needs to rebalance). There are a variety of situations (offset commit request, fetch offset, etc) that can cause this issue. I will suggest that you research what's causing this situations

Nautilus
  • 2,236
  • 2
  • 17
  • 33
  • 1
    i change my network setup so both of consumer server are on the same LAN , but still i am getting the above – Shahbour Mar 04 '16 at 10:00
  • any idea how i can debug this as it still exist – Shahbour Mar 11 '16 at 23:49
  • can you provide more information? I can only point out what that error means base on the information you are providing. – Nautilus Mar 12 '16 at 10:19
  • i did update the question to put more info , i think the problem is related to session.timeout.ms – Shahbour Mar 12 '16 at 16:22
  • you are using 2 versions of kafka? I am not sure what kind of consequences that can have. – Nautilus Mar 12 '16 at 17:27
  • I had to use 0.8.2 on the producer because that machine got Java 1.6 and I can't update it – Shahbour Mar 12 '16 at 17:38
  • Then, you should use 0.8.2 in all your setup, I don't think that anybody has done real testing using different versions of kafka, so the problems that you may have are unknown and unpredictable. Even if the producer code didn't change as much, the jump between 0.8.x to 0.9.x is consider a major upgrade. BTW: If your coordinator is dying because of session.timeout.ms (believe me) that everything wouldn't be working "great" as you are saying, because you wouldn't be able to consume messages because your client is out of the consumer group and you should be seeing other INFO on the log. – Nautilus Mar 12 '16 at 17:52
  • I am not getting any thing on Kafka log , I will try to increase the log level but I don't know where to do that , any guid ? – Shahbour Mar 12 '16 at 17:55
  • thats why I am saying is likely that this is not the case for this error. – Nautilus Mar 13 '16 at 00:55
  • i found that is related to connections.max.idle.ms – Shahbour Mar 20 '16 at 20:54
7

I have faced the same issue. Finally after follow Shannon recommendation about TRACING logs, I used:

logging.level.org.apache.kafka=TRACE

To find out that my client was trying to resolve Euler:9092 as coordinator... Local name!! So I commented out and changed listeners and advertised.listeners values in server.properties file. It is working now! :-)

3

In my case the message was in logs when I try to assign partitions manually. After I've read in api docs of the new consumer follow notice:

It is also possible for the consumer to manually assign specific partitions (similar to the older "simple" consumer) using assign(Collection). In this case, dynamic partition assignment and consumer group coordination will be disabled.

That is, if you have code like this:

    KafkaConsumer<String, String> consumer = new KafkaConsumer(props);
    consumer.assign( Arrays.asList(
            new TopicPartition("topic", 0),
            new TopicPartition("topic", 1)
    ));

then the message "Marking the coordinator 2147483647 dead" puts in our logs always.

vasyaod
  • 64
  • 3
  • Yes i do have this in my code , the reason i did that is because i want to get the last 1000 message on restart . ` TopicPartition partition0 = new TopicPartition("act", 0); consumer.assign(Arrays.asList(partition0));` – Shahbour Jun 20 '16 at 07:46
  • 1
    Yeah, with Kafka 0.9 and assigning partitions manually, this message appears to occur when a connection is idle too long however the consumer appears to silently recover and continue reading messages. To verify, you can set the log level of org.apache.kafka.common.network.Selector to TRACE. – Shannon Jun 05 '17 at 17:02
  • I would like to point out that this could also be a symptom of extreme Kafka configurations. When weird behavior is noticed, one should always challenge all of the properties overrides and find a good reason for them to be applied - otherwise is the default always right. A particular example is "Changes to Heartbeat Behavior in Recent Kafka Versions" in the O'Reilly Kafka book. – reim Apr 25 '18 at 10:41
  • A reference to an actual implementation of mine: in a Spark streaming application our batches could last up to five minutes, therefore we adjusted the Kafka properties under these settings: "heartbeat.interval.ms" -> "30000", "session.timeout.ms" -> "90000" "request.timeout.ms" -> "120000" – reim Apr 26 '18 at 08:00
1

This is basically you are not able to reach to Kafka.
In my case I was running Kafka in vagrant box, and if I start VPN it refresh
vagrant ip hence it was not able to connect to it.
Possible Solution: In this case stop VPN and start your vagrant.

Kundan Atre
  • 3,853
  • 5
  • 26
  • 39
1

This may also be related to a long garbage collection stop-the-world phase. In my case I encountered this message after > 10 sec GCs.

dux2
  • 1,770
  • 1
  • 21
  • 27
0

This error mostly occurs when there is a conflict between coordinator and consumer. The first thing you should do is to expose the listener port in server.properties and secondly you need to remove all the logs under kafka-logs. Don't forget to restart the server and zookeeper after these steps. It will resolve the issue.

joanis
  • 10,635
  • 14
  • 30
  • 40
wilky
  • 1
  • 1
-1

I faced this issue today and solved it (temporarily, might I add). I've posted an answer here on how I did it.

Community
  • 1
  • 1
Ankush92
  • 401
  • 1
  • 9
  • 20
  • i don't think they are related , on your case the firewall prevented the connection while in our case client is being disconnected and reconnected after a fixed time – Shahbour Apr 21 '17 at 11:35
  • Are you still facing this problem? How did you solve it anyway? – Ankush92 Apr 21 '17 at 11:54
  • no it is on old project , as i recall in one of the answers it was related to specifying certain partition – Shahbour Apr 21 '17 at 12:14