How to filter keys and value with a Processor using Kafka Stream DSL

Question

I have a Processor that interact with a StateStore to filter and do complex logic on the messages. In the process(key,value) method I use context.forward(key,value) to send the keys and values that I need. For debugging purposes I also print those.

I have a KStream mergedStream that results from a join of two other streams. I want to apply the processor to the records of that stream. I achieve this with : mergedStream.process(myprocessor,"stateStoreName")

When I start this program, I can see the proper values to be printed to my console. However if I send the mergedStream to a topic using mergedStream.to("topic") the values on the topic are not the one I have forwarded in the processor, but the original ones.

I use kafka-streams 0.10.1.0.

What is the best way to get the values I have forwarded in the processor to another stream ?

Is it possible to mix the Processor API with the streams created by the KStream DSL?

score 14 · Accepted Answer · answered Nov 28 '16 at 05:58

14

Short:

To solve your problem you can use transform(...) instead of process(...) which gives you access to Processor API within DSL, too.

Long:

If you use process(...) you apply a processor to a stream -- however, this is a "terminating" (or sink) operation (its return type is void), i.e., it does not return any result (here "sink" does only mean that the operator has no successor -- it does not imply that any result is written somewhere!)

Furthermore, if you call mergedStream.process(...) and mergedStream.to(...) you basically branch-and-duplicate your stream and send one copy to each downstream operator (ie, one copy to process and one copy to to.

Mixing DSL and Processor API is absolutely possible (you did it already ;)). However, using process(...) you cannot consumer data you forward(...) within DSL -- if you want to consume Processor API result, you can use transform(...) instead of process(...).

answered Nov 28 '16 at 05:58

Matthias J. Sax

59,682
7
117
137

How would you use DSL if in your Processor API code you forward to more than one topic(outputTopic1, outputTopic2)? Could you create `val stream1 = builder.stream(outputTopic)` and `val stream2 = builder.stream(outputTopic2)` and build from there? – xmar May 09 '18 at 22:25
1

If you write to two different topics in via Processor API, yes, you can read each topic back as KStream as you describe. Note, that if you call `context.forward()` you don't write to any topic but forward to a named downstream processor. Not sure what you want to achieve. Maybe it's worth to ask a separate question :) – Matthias J. Sax May 09 '18 at 23:48
Right, I forgot to mention that my forwards go to sinks to two different topics (that will be consumed by other applications too). Implications on new subtopologies come in, so I'll make a question though, since I think it's relevant. Thanks! – xmar May 11 '18 at 07:47
Will there be a way to continue with the DSL after using `.process()`? It could be so useful, I understand it can be challenging specially when it's regarding the unknown next node from the processor, but maybe there's a way to circumvent that! – Renato Mefi Mar 14 '20 at 17:50
It will not be possible. `process()` is designed to be a terminal operation. If you want to continue you can use `transform()` (or `flatTransfrom()`, `transformValues()`, or `flatTransformValue()`) -- note that the only difference between `transform()` and `process()` is terminal vs non-terminal. Ie, by picking `process()` you explicitly decide to not continue! If you want to continue, pick `transform()`. – Matthias J. Sax Mar 14 '20 at 18:21

How to filter keys and value with a Processor using Kafka Stream DSL

1 Answers1

Linked