13

In the Kafka Streams Developer Guide it says:

Kafka Streams applications can only communicate with a single Kafka cluster specified by this config value. Future versions of Kafka Streams will support connecting to different Kafka clusters for reading input streams and writing output streams.

Does this mean that my whole application can only connect to a single Kafka Cluster or each instance of KafkaStreams can only connect to a single cluster?

Could I create multiple KafkaStreams instances with different properties that connect to different clusters?

Magnus Reftel
  • 967
  • 6
  • 19
mixiul__
  • 395
  • 1
  • 2
  • 12

2 Answers2

12

It means that a single application can only connect to one cluster.

  • You cannot read a topic from cluster A and write the result of your computation to cluster B.
  • It's not possible to read two topics from two different clusters with the same instance.

Could I create multiple KafkaStreams instances with different properties that connect to different clusters?

Yes, absolutely. But those different instances will be different applications. (Think "consumer groups".)

Update:

Within a single JVM, you can create as many KafkaStreams instances as you like. You can also configure them to connect to different clusters (and you can use the same KStreamBuilder for all of them if you want to do the same processing).

Matthias J. Sax
  • 59,682
  • 7
  • 117
  • 137
  • 1
    I think mixiul__ asked whether it is possible to use a single Java application that creates multiple `KafkaStreams` instances which then connect to different clusters each. – miguno Aug 24 '17 at 07:06
  • That is correct. For context, I want to consume the same event type from two different clusters - within a single java application. I was hoping to have my processing logic in one class that I can instantiate twice, each with its own cluster specific KafkaStreams instance. – mixiul__ Aug 24 '17 at 08:24
  • Updated my answer: yes, you can do that. – Matthias J. Sax Aug 24 '17 at 15:59
  • _"Within a single JVM, you can create as many KafkaStreams instances as you like."_ Why would you do that? I'm curious what the use cases could be (from trusted sources). – Jacek Laskowski Jan 13 '19 at 12:19
  • If you want to process data from different cluster. -- That is, what the question is about. – Matthias J. Sax Jan 13 '19 at 16:56
7

Just to add to the excellent answer from @Matthias J. Sax.

Does this mean that my whole application can only connect to a single Kafka Cluster or each instance of KafkaStreams can only connect to a single cluster?

I think there are two questions here.

It depends on the definition of "my whole application", i.e. it could simply be a single KafkaStreams instance or multiple instances on a single JVM or perhaps multiple KafkaStreams instances on a single JVM in a Docker container that is executed as a pod. Whatever it is, you can find "my whole application" a bit too broad and not very precise.

The point is that there is no way you can create a KafkaStreams instance that could talk to multiple Kafka clusters (since the configuration is through properties that are key-value pairs in a map) and so just by this you could answer your own question, couldn't you?


Being unable to use two or more Kafka clusters in a Kafka Streams application is one of the differences between Kafka Streams and Spark Structured Streaming (with the latter being able to use as many Kafka clusters as you want and so you could build pipelines between different Kafka clusters).

Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420