1

A custom processor which buffers events in a simple java.util.List in process() - this buffer is not a state store.

Every 30 seconds WALL_CLOCK_TIME, punctuate() sorts this list and flushes to the sink. Assume only single partition source and sink. EOS processing guarantee is required.

I know that at any given time either process() gets executed or punctuate() gets executed.

I am concerned about this buffer not being backed by changelog topic. Ideally I believe this should have been a state store to support EOS.

But there is an argument that setting commit.interval to more than 30 seconds - i.e. say 40 seconds, will make sure that the events in the buffer would never be lost. And also since we are using WALL_CLOCK_TIME, the punctuate() will always be called every 30 seconds regardless of whether we have events are not.

Is this a valid argument? What are the cases here that will make the events in the buffer lost forever?

@Override
public void init(ProcessorContext processorContext) {
    super.init(processorContext);
    this.buffer = new ArrayList<>();
    context().schedule(Duration.ofSeconds(20L), PunctuationType.WALL_CLOCK_TIME, this::flush);
}

void flush(long timestamp){
    LOG.info("Punctuator invoked.....");
    buffer.stream().sorted(Comparator.comparing(o -> o.getId())).forEach(
            i -> context().forward(i.getId(), i)
    );
}

@Override
public void process(String key, Customer value) {
    LOG.info("Processing {}", key);
    buffer.add(value);
}
  • If the application encounters an unrecoverable error that is not / cannot be handled that list that is not persisted or distributed would be lost. – Nic Pegg Jul 01 '20 at 04:58
  • I recommend to use Kafka state store like KeyValueStore, so your stream will be fault tolerant. with your current implementation, you might lose messages from time to time. please take a look at [similar post](https://stackoverflow.com/a/49389721/2335775) – Vasyl Sarzhynskyi Jul 01 '20 at 05:18
  • As an atlernative to List, you can use a *persistent* `KeyValueStore` that allows you iterate over it in the *insertion order* to mimic the behaviour of a `List` – JavaTechnical Jul 01 '20 at 06:12

1 Answers1

1

I sort of figured few arguments against tuning commit and punctuate interval and calling this setup foolproof.

From docs, on WALL_CLOCK_TIME

This is best effort only as its granularity is limited by how long an iteration of the processing loop takes to complete

It's possible to "miss" a punctuation if: with PunctuationType#WALL_CLOCK_TIME, on GC pause, too short interval

Ideal :

punctuate : |-------20s-------|-------20s-------|------20s-------|------20s------|

c o m m it : |------------30s------------|------------30s-----------|------------30s---

Say process() took too much time (say 18 seconds) so punctuate() was not invoked for the second run at 40th second - because as doc mentioned, too short interval.

Now at 31st second, if the application crashes, even with eos enabled, events in buffer would have been committed at source. At restart, the buffer would be lost.

punctuate : |-------20s-------|------process()---------20s-------|------20s------|

c o m m it : |------------30s------------|------------30s-------------|------------30s---

Hence it is not valid argument that tuning commit and punctuate interval would curb the need for state store.