16

I have four instances of a Kafka stream application running with the same application id. All the input topics are of a single partition. To achieve scalability I have passed it through an intermediate dummy topic with multiple partitions. I have set request.timeout.ms as 4 minutes.

The Kafka instances go into the ERROR state without any exception being thrown. It is difficult to figure out what is the exact issue. Any ideas?

[INFO ] 2018-01-09 12:30:11.579 [app-new-03-cb952917-bd06-4932-8c7e-62986126a5b4-StreamThread-1] StreamThread:939 - stream-thread [app-new-03-cb952917-bd06-4932-8c7e-62986126a5b4-StreamThread-1] Shutting down
[INFO ] 2018-01-09 12:30:11.579 [app-new-03-cb952917-bd06-4932-8c7e-62986126a5b4-StreamThread-1] StreamThread:888 - stream-thread [app-new-03-cb952917-bd06-4932-8c7e-62986126a5b4-StreamThread-1] State transition from RUNNING to PENDING_SHUTDOWN.
[INFO ] 2018-01-09 12:30:11.595 [app-new-03-cb952917-bd06-4932-8c7e-62986126a5b4-StreamThread-1] KafkaProducer:972 - Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms.
[INFO ] 2018-01-09 12:30:11.605 [app-new-03-cb952917-bd06-4932-8c7e-62986126a5b4-StreamThread-1] StreamThread:972 - stream-thread [app-new-03-cb952917-bd06-4932-8c7e-62986126a5b4-StreamThread-1] Stream thread shutdown complete
[INFO ] 2018-01-09 12:30:11.605 [app-new-03-cb952917-bd06-4932-8c7e-62986126a5b4-StreamThread-1] StreamThread:888 - stream-thread [app-new-03-cb952917-bd06-4932-8c7e-62986126a5b4-StreamThread-1] State transition from PENDING_SHUTDOWN to DEAD.
[WARN ] 2018-01-09 12:30:11.605 [app-new-03-cb952917-bd06-4932-8c7e-62986126a5b4-StreamThread-1] KafkaStreams:343 - stream-client [app-new-03-cb952917-bd06-4932-8c7e-62986126a5b4] All stream threads have died. The Kafka Streams instance will be in an error state and should be closed.
[INFO ] 2018-01-09 12:30:11.605 [new-03-cb952917-bd06-4932-8c7e-62986126a5b4-StreamThread-1] KafkaStreams:268 - stream-client [app-new-03-cb952917-bd06-4932-8c7e-62986126a5b4] State transition from RUNNING to ERROR.
Ian Campbell
  • 23,484
  • 14
  • 36
  • 57
Viswapriya
  • 191
  • 2
  • 4
  • 2
    Try to register an `UncaughtExceptionHandler` to get more details: https://docs.confluent.io/current/streams/developer-guide/write-streams.html or increase log level to DEBUG – Matthias J. Sax Jan 09 '18 at 22:07
  • Yeah! Log level is already in debug mode and there is an uncaughtExceptionHandler already registered to the kafka stream-still nothing is being logged. – Viswapriya Jan 10 '18 at 07:37
  • That's weird... What is logged before the `Shutting down` message? – Matthias J. Sax Jan 10 '18 at 07:57
  • I have a customized Stream Partitioner. A log line from that is being continuosly written "EventStreamPartitioner:20 - code 'isro' and partition '109' ". – Viswapriya Jan 11 '18 at 05:58
  • I have an consumer offset reset tool which changes the offset for a topic using OffsetCommitRequest . I stopped the application and used the tool to change the offsets of the input topic and intermediate topic. I have two brokers and I sent the request to one broker. Then restarted the application after which it continuously ran into ERROR state. I think offset change led to the corruption of the log files. Once I changed the consumer group id it worked. But I am still not sure what could have been the reason for the kafka log files to get corrupted after the offset change – Viswapriya Jan 11 '18 at 06:02
  • Hard to say... But glad it works now. No idea what `code 'isro' and partition '109'` means... Does you EventStreamPartioner fail with an exception? – Matthias J. Sax Jan 11 '18 at 08:14
  • Based on the hash value of code I am assigning the partition. That logger was just written to understand which application instance is processing event of what code. Nope the EventStreamPartioner doesn't seem to throw any exceptions.If any, the uncaught exception handler would have logged it since I have a logger within it to log exceptions. – Viswapriya Jan 12 '18 at 05:18
  • @MatthiasJ.Sax can you please state how it works? am stuck here. – Amare May 07 '20 at 12:50
  • @Amare Not sure what your question is. Maybe just post your own new question? – Matthias J. Sax May 07 '20 at 23:53
  • @MatthiasJ.Sax you said glad it works, and was curious how it did work for you? I had same issue with kafka streams 2.5 and infact I need to add a dummy topic for me to work but don't know why. I have this same thing on https://stackoverflow.com/questions/61342530/kafka-streams-2-5-0-requires-input-topic# – Amare May 09 '20 at 12:33
  • @Viswapriya said `Once I changed the consumer group id it worked.` (I never had an issue, I just tried to help, so you need to ask @Viswapriya). -- Also note that this question is quite old. For the other question you linked, it's a regression bug in 2.5.0. And if you read the comments, it's already fixed for future 2.5.1 and 2.6.0 releases. So either you keep your workaround or you need to downgrade to 2.4.x and wait for 2.5.1 or 2.6.0 release. – Matthias J. Sax May 10 '20 at 20:39
  • Facing the exact same issue with 2.7.1 version, if I change consumer ID it works. Not sure of the root cause. – sampopes Jun 17 '21 at 14:44

2 Answers2

3

Thea asker shared his solution in the comments:

Once I changed the consumer group id it worked.

It is also worth noting that the related issue (which may or may not have the same root cause) has been introduced in recent versions, and now looks to be fixed as well in Kafka versions 2.5.1 and 2.6.0 above.

As such people who are experiencing this today may want to check whether they are on a high (or low) enough version to avoid this issue.

Dennis Jaheruddin
  • 21,208
  • 8
  • 66
  • 122
-1

You may also need to set the default.production.exception.handler Kafka Streams property to a class that implements ProductionExceptionHandler and, unlike the default class DefaultProductionExceptionHandler, logs the error before triggering a permanent failure state.