14

In Kafka Stream library, I want to know difference between KTable and GlobalKTable.

Also in KStream class, there are two methods leftJoin() and outerJoin(). What is the difference between these two methods also?

I read KStream.leftJoin, but did not manage to find an exact difference.

Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420
Ajit Dongre
  • 871
  • 2
  • 8
  • 11

1 Answers1

38

KTable VS GlobalKTable

A KTable shardes the data between all running Kafka Streams instances, while a GlobalKTable has a full copy of all data on each instance. The disadvantage of GlobalKTable is that it obviously needs more memory. The advantage is, that you can do a KStream-GlobalKTable join with a non-key attribute from the stream. For a KStream-KTable join and a non-key stream attribute for the join is only possible by extracting the join attribute and set it as the key before doing the join -- this will result in a repartitioning step of the stream before the join can be computed.

Note though, that there is also a semantical difference: For stream-table join, Kafka Stream align record processing ordered based on record timestamps. Thus, the update to the table are aligned with the records of you stream. For GlobalKTable, there is no time synchronization and thus update to GlobalKTable and completely decoupled from the processing of the stream records (thus, you get weaker semantics).

For further details, see KIP-99: Add Global Tables to Kafka Streams.

leftJoin() VS outerJoin()

About left and outer joins: it's like in a database a left-outer and full-outer join, respectively.

For a left outer join, you might "lose" data of your right input stream in case there is no match for the join in the left-hand side.

For a (full)outer join, no data will be dropped and each input record of both streams will be in the result stream.

Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420
Matthias J. Sax
  • 59,682
  • 7
  • 117
  • 137
  • So, is GlobalKTable slow during writes? Because it has to write the changed data to all the application instances! – JavaTechnical Apr 30 '19 at 07:02
  • 1
    Not sure what you mean by slow. Write are only to the "global" input topic -- and each app instance will consumer the "global" topic to update it's copy fo the `GlobalKTable`. – Matthias J. Sax Apr 30 '19 at 12:37
  • I thought that GlobalKTable is not a Kafka topic and that it resides *only* on the application side and that it needs to be replicated across multiple instances. – JavaTechnical Apr 30 '19 at 13:10
  • 1
    Please read the docs:https://docs.confluent.io/current/streams/concepts.html#globalktable – Matthias J. Sax May 01 '19 at 22:08