4

We are using kafka-streams 2.3.1 and I've just noticed that if broker is down, the streams app seems to be content to try to keep trying connecting forever.

new KafkaStreams(createTopology(), properties()).start()
 o.apache.kafka.clients.NetworkClient - [AdminClient clientId=test] Connection to node -1 (broker/127.0.0.1:9092) could not be established. Broker may not be available.

The streams state is REBALANCING while this is going on so there's no good way to determine if the connection is just broken.

Is there a way to set either a timeout or a number of retries for broker(s) connection attempts?

Leo
  • 1,016
  • 1
  • 13
  • 32
  • Do you have more than one broker? Are you providing them all as bootstrap servers? – OneCricketeer Dec 19 '19 at 05:12
  • I'm testing it locally with just one broker but I would assume the behaviour doesn't change (i.e. it keeps trying to connect). I configure with `config.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "broker:9092");` – Leo Dec 19 '19 at 05:50
  • `broker` points to localhost and I don't start it (that's why it can't connect) – Leo Dec 19 '19 at 05:51
  • That makes sense, so are you trying to run just a unit test? Or why are you trying to run the app if the broker isn't up? – OneCricketeer Dec 19 '19 at 07:41
  • trying to run the app without Kafka running – Leo Dec 19 '19 at 22:27
  • Yes, but what good would that do? – OneCricketeer Dec 19 '19 at 22:47
  • It's to test what happens when broker becomes not available. "Retrying forever" in REBALANCING state doesn't seem like the best strategy. I'm trying to find the way to fail the application after a number of retries or timeout – Leo Dec 19 '19 at 22:57
  • There is a `session.timeout.ms`. Also `retry.backoff.ms` and `request.timeout.ms`. I don't think there is a hard counter on connection retrying because it just round-robins over the boostrap server list until a healthy broker is online... You should guard that before you start your stream by trying to establish a connection on your own (via a port check or `AdminClient.describeCluster`, or checking the input topic with `AdminClient.describeTopics`) – OneCricketeer Dec 19 '19 at 23:54
  • Neither of those timeouts do the job. Thanks for the pointer to AdminClient. That helps with the problems at startup. I still find it weird that if broker goes down midway, the streams will just get stuck in REBALANCING state forever and there doesn't seem to be any good way to deal with that – Leo Dec 23 '19 at 04:10
  • You can assign a listener to see when the partitions get assigned and revoked, then forcibly stop the application, if you really wanted to. But otherwise, Kafka Streams acts the same as a long polling loop, waiting for available connections – OneCricketeer Dec 23 '19 at 05:20

1 Answers1

3

There is unfortunately no good workaround for this problem. The issue is actually a consumer issue, as the consumer just tries to reconnect but does not surface it's internal state to Kafka Streams. Also, it's not possible to configure the consumer to give up at some point.

There is a KIP to add a "DISCONNECTED" state to Kafka Streams, but there was not much progress lately... It's complicated... https://cwiki.apache.org/confluence/display/KAFKA/KIP-457%3A+Add+DISCONNECTED+status+to+Kafka+Streams

Matthias J. Sax
  • 59,682
  • 7
  • 117
  • 137
  • 1
    I've also just realised that if all brokers go down after initial connection was established, then the streams state won't even be REBALANCING. It'll be RUNNING. Essentially there's no good way (pinging with AdminClient regularly?) to have a healthcheck on a streams app. It's always RUNNING – Leo Jan 02 '20 at 23:45