1

How does a global State Store differ from a normal state store ?

Has a global state store a copy of the data in all the instances running on different machines ? How does it behave in case of a restart ? Because a global state store doesn't use any change-log topic for restoration, in my scenario, the source topic from the global store has no key.

Adrien H
  • 643
  • 6
  • 21
Mohit Singh
  • 401
  • 1
  • 10
  • 30
  • Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the [ask] page for help clarifying this question. – sleepToken Mar 09 '20 at 18:42
  • Change it now looks simple question with sufficient details – Mohit Singh Mar 09 '20 at 18:49
  • Are you asking about the difference between GlobalKTable and a Streams state store (using RocksDB)? – Kevin Hooke Mar 10 '20 at 01:15
  • i want to know how global-state store work is it internally create any change-log topic for restore when machine restart. because in my scenario the source topic for global-state store has no key [key is null] so when i restart my machine. global state store directly load data from source topic – Mohit Singh Mar 10 '20 at 06:51
  • Please read the tag description of `confluent-kafka` before using it. It's unrelated to Kafka Streams – OneCricketeer Mar 10 '20 at 12:54

1 Answers1

1
  1. Is Global state store has copy of data in all the instance running on different machine?

    Yes.

  2. How it behave in case of restart because global state store doesn't use any change-log topic for restore in my scenario the source topic from the global store has no key:
    • GlobalKTable disable logging by default so it'll not push changelog to the changelog topic for GlobalKTable (it still creates the changelog topic though). You have to re-populate data to GlobalKTable from an input topic which enables log compaction (cleanup.policy=compact) which message's key is the key you want to lookup in your GlobalKTable. Kafka Stream will just re-populate data from the input topic to GlobalKTable when you restart application.
    • in my scenario the source topic from the global store has no key: you have to map your source topic to the new topic which I mentioned above using a KeyValueMapper, and enable log compaction on the output topic.
Tuyen Luong
  • 1,316
  • 8
  • 17
  • global state store don't create any change-log topic for restoration? or only solution is re-partitioning of data to another topic that topic will be the source topic for global state store – Mohit Singh Mar 10 '20 at 06:46
  • global state use input topic instead of changelog topic for restoration, but they does create the changelog topic, it's not clearly state in the document https://docs.confluent.io/current/streams/concepts.html#globalktable – Tuyen Luong Mar 10 '20 at 06:50
  • after read your question, I run a simple topology to check and really surprise about it. You can read more in here https://stackoverflow.com/questions/52707748/why-does-kafka-streams-enforce-logging-disabled-for-globalktable-state-stores – Tuyen Luong Mar 10 '20 at 06:52
  • 1
    "only solution is re-partitioning of data to another topic that topic will be the source topic for global state store" exactly, and this topic must enable log compaction – Tuyen Luong Mar 10 '20 at 07:05
  • can we enable logging on global-state-store? if we do so will it work? i seen many workaround to create on more topic where we dump data from the source topic with our custom key which will be use by the global-state store during restore. but in my scenario its little bit difficult. – Mohit Singh Mar 10 '20 at 07:09
  • No, you can enable logging in `Topology.addGlobalStore()` for later used in `Processor` but it will throw a `TopologyException("StateStore " + storeName + " for global table must not have logging enabled.");` How is that difficult? – Tuyen Luong Mar 10 '20 at 07:17
  • { "id": "user-12345", "user_client": [ "clientid-1", "clientid-2" ] } This is my data from input topic with key as null. i want to create two global store one to store id -- to above record. another one is clientid-1:["user-12345"] list. so if you see above example. i need to store record in this way id --> whole record. another is each client i need to maintain array list to user id example: "clientid-1":["user-12345"] "clientid-2":["user-12345"] – Mohit Singh Mar 10 '20 at 07:26
  • You should ask another question. But my 2 cents is : you send these to 2 different topic and use 2 separate GlobalKTable (one topic also work if you have a common serializer), but to extract `clientid-1` as key you have to used another KTable to store the aggregated result so far for `clientid` key e.g `"clientid-1":["user-12345", "user-12346", "user-12347"] ` – Tuyen Luong Mar 10 '20 at 07:45
  • The answer here also have an interesting approach as well, https://stackoverflow.com/questions/59029964/kafka-streams-use-cases-for-add-global-store?rq=1 – Tuyen Luong Mar 10 '20 at 07:47
  • continue to use the input topic and deserialize and invoke the processors during restoration as well, how we can do this because in restore global-store bypass the processor. – Mohit Singh Mar 10 '20 at 07:51
  • https://stackoverflow.com/questions/60613596/global-state-store-dont-create-change-log-topic-what-is-the-workaround-if-input – Mohit Singh Mar 10 '20 at 08:08
  • @MohitSingh will a client_id have multiple user_id or just one? – Tuyen Luong Mar 10 '20 at 14:54
  • 1
    client_id can have multiple user. https://stackoverflow.com/questions/60613596/global-state-store-dont-create-change-log-topic-what-is-the-workaround-if-input created new question for this – Mohit Singh Mar 10 '20 at 15:05