1

I currently have a simple topology:

    KStream<String, Event> eventsStream = builder.stream(sourceTopic);
    eventsStream.transformValues(processorSupplier, "nameCache")
        .to(destinationTopic);

My events sometimes have a key/value pair and other times have just the key. I want to be able to add the value to those events that are missing the value. I have this working fine with a local state store but when I add more tasks, sometimes the key/value events and the value events are in different threads and so they aren't updated correctly.

I'd like to use a global state store for this but I'm having difficulty figuring out how to update the global store when new key/value pairs come in. I've created a global state store with the following code:

    builder.addGlobalStore(stateStore, "global_store", Consumed.with(Serdes.String(), Serdes.String()), new ProcessorSupplier<String, String>() {
      @Override
      public Processor<String, String> get() {
        return new Processor<String, String>() {
          private ProcessorContext context;
          @Override
          public void init(final ProcessorContext processorContext) {
            this.context = processorContext;
          }

          @Override
          public void process(final String key, final String value) {
            context.forward(key, value);
          }

          @Override
          public void close() {
          }
        };
      }
    });

As far as I can tell, it is working but since there is no data in the topic, I'm not sure.

So my question is how do I update the global store from inside of the transformValues? store.put() fails with an error that global store is read only.

I found Write to GlobalStateStore on Kafka Streams but the accepted answer just says to update the underlying topic but I don't see how I can do that since the topic isn't in my stream.

---Edited---

I updated the code per #1 in the accepted answer. I see the new key/value pairs show up in global_store. But the globalStore doesn't seem to see the new keys. If I restart the application, it fills the cache with the data in the topic but new keys aren't visible until after I stop/start the application.

I added logging to the process(String, String) in the global store processor and it shows new keys being processed. Any ideas?

  • I updated the answer since I make some wrong assumptions, added a solution using stream DSL – Tuyen Luong Apr 17 '20 at 16:47
  • I switched from using `builder.addGlobalStore` to using `builder.globalTable` and now everything is working. I don't know what I did wrong in the global store. – snicher_natrev Apr 17 '20 at 19:46
  • 1
    In your global `Processor`, calling `context.forward()` is a no-op -- there is no downstream processor. Instead, you should get the state store from the `context` in `init()` and call `store.put()` to put the data into the global store. If you don't call `put()` nothing will be added to the store during runtime. On startup, a different code path is executed during the "pre loading phase" for the global store and thus you could see the data after restart. (Cf. https://issues.apache.org/jira/browse/KAFKA-7663) – Matthias J. Sax Apr 18 '20 at 21:51
  • @MatthiasJ.Sax, global state stores appear to be read only. If I call `put()`, I get the error `java.lang.UnsupportedOperationException: Global store is read only` – snicher_natrev Apr 19 '20 at 03:23
  • The `transformValues()` cannot write it, but the `Processer` provided to `addGlobalStore()` has write access -- this `Processor` is the one responsible to maintain the store (and note, it's only allowed to put the input key/value pair into the store unmodified -- cf KAFKA-7663 I linked above about the "why") and it's the only one with write access. – Matthias J. Sax Apr 19 '20 at 03:56
  • @MatthiasJ.Sax, good to know. – snicher_natrev Apr 20 '20 at 15:12

1 Answers1

1
  1. You can only get a real-only access on Global state store inside transformValues, and if you want to update a global state store, yes, you have to send the update to the underlying input topic of Global state store, and your state will update the value when this update message is consumed. The reason behind this is that, Global state store are populated on all application instances and use this input topic for fault tolerance. You can do this by branching you topology:
KStream<String, Event> eventsStream = builder.stream(sourceTopic);
//processing message as normal
eventsStream.transformValues(processorSupplier, "nameCache")
        .to(destinationTopic);

//this transform to the updated message to global state
eventsStream.transform(updateGlobalStateProcessorSupplier, "nameCache")
        .to("global_store");
  1. Using low level API to construct your Topology manually, so you can forward both to your destinationTopic topic and global_state topic using ProcessorContext.forward to forward message to sink processor node using name of the sink processor.
Tuyen Luong
  • 1,316
  • 8
  • 17
  • exactly, transformValues does not allow you to use `context.forward()`, but forward here means send to the Named processor node but not using Stream DSL but construct the Topology yourself, [take a look at this](https://kafka.apache.org/documentation/streams/developer-guide/processor-api.html#connecting-processors-and-state-stores) – Tuyen Luong Apr 17 '20 at 16:34
  • I accepted this answer as it does get the messages being written to the topic. But I'm now not seeing new keys until I restart the application. I updated the question with that info. Any ideas? – snicher_natrev Apr 17 '20 at 17:33
  • @snicher_natrev so it's updated with new key when using GlobalKTable but not when you build global state from `StreamsBuilder#addGlobalStore ()` API? – Tuyen Luong Apr 18 '20 at 01:28
  • that is correct. The `StreamsBuilder#addGlobalStore()` API would only update when I restarted the application and it synced with the topic. But while the application was running, no new keys or updated values would appear in the store. – snicher_natrev Apr 18 '20 at 16:59