1

When there are updates done to an underlying topic of a GlobalKTable, what is the logic for all instances of KStream apps to get the latest data? Below are my follow up questions:

  1. Would the GlobalKTable be locked at record level or table level when the updates are happening?
  2. According to this blog: Kafka GlobalKTable Latency Issue, can the latency go upto 0.5s?! If so, is there any alternative to reduce the latency?
  3. Since GlobalKTable uses RocksDB by default as the state store, are all features of RocksDB available to use?

I understand the GlobalKTable should not be used for use-cases that require frequent updates to the lookup data. Is there any other key-value store that we can use for use-cases that might require updates on table data - Redis for example?

I could not find much documentation about GlobalKTable and its internals. Is there any available documentations available?

guru
  • 409
  • 4
  • 21

2 Answers2

2

GlobalKTables are updated async. Hence, there is no guarantee whatsoever when the different instances are updated.

Also, the "global thread" uses a dedicated "global consumer" that you can fine tune individually to reduce latency: https://docs.confluent.io/current/streams/developer-guide/config-streams.html#naming

RocksDB is integrated via JNI and the JNI interface does not expose all features of RocksDB. Furthermore, the "table" abstraction "hides" RocksDB to some extent. However, you can tune RocksDB via rocksdb.config.setter (https://docs.confluent.io/current/streams/developer-guide/config-streams.html#rocksdb-config-setter).

Evgeniy Berezovsky
  • 18,571
  • 13
  • 82
  • 156
Matthias J. Sax
  • 59,682
  • 7
  • 117
  • 137
  • Got it. If we look at a single instance, is the update async - in the sense that when a stream is processing, would it pause execution to update the `GlobalKTable` or would it continue as is and the instance gets the latest data async? – guru Jun 04 '20 at 03:13
  • 1
    The global state/table is updated using a dedicated thread, So processing on the "regular" input topic happens in parallel. – Matthias J. Sax Jun 04 '20 at 05:05
0

The Javadocs for KStream#join() are pretty clear that joins against a GlobalKTable only occur as records in the stream are processed. Therefore, to answer your question, there are no automatic updates that happen to the underlying KStreams: new messages would need to be processed in them in order for them to see the updates.

"Table lookup join" means, that results are only computed if KStream records are processed. This is done by performing a lookup for matching records in the current internal GlobalKTable state. In contrast, processing GlobalKTable input records will only update the internal GlobalKTable state and will not produce any result records.

  1. If a GlobalKTable is materialized as a key-value store, most of the methods for iterating over and mutating KeyValueStore implementations use the synchronized keyword to prevent interference from multiple threads updating the state store concurrently.

  2. You might be able to reduce the latency by using an in-memory key-value store, or by using a custom state store implementation.

  3. Interacting with state stores is controlled via a set of interfaces in Kafka Streams, for example KeyValueStore, so in this sense you're not interacting directly with RocksDB APIs.

ck1
  • 5,243
  • 1
  • 21
  • 25
  • Thanks @ck1. I have a few follow-up questions. – guru Jun 01 '20 at 18:32
  • Questions in order: 1. the `synchronized` keyword is on the entire table or on a particular key-value pair? Can you please point me to a Javadoc if any? 2. `GlobalKTable` is itself materialized as an in-memory key-value store, is my understanding correct? Are you suggesting that `GlobalKTable` has latency issues and it is better to have different in-memory key-value store? 3. I understand that we cannot interact with RocksDB directly, but can we somehow configure RocksDB via those Kafka Streams API? – guru Jun 01 '20 at 18:41