I need to parse complex messages on kafka using multiple transformers. Each transformer parses a part of the message and edits the message by filling some attributes on the message. In the end the fully parsed message is stored in the database using a Kafka consumer. Currently, I'm doing this:
streamsBuilder.stream(Topic.A, someConsumer)
\\ filters messages that have unparsed parts of type X
.filter(filterX)
\\ transformer that edits the message and produces new Topic.E messages
.transform(ParseXandProduceE::new)
.to(Topic.A, someProducer)
streamsBuilder.stream(Topic.A, someConsumer)
\\ filters messages that have unparsed parts of type Y
.filter(filterY)
\\ transformer that edits the message and produces new Topic.F messages
.transform(ParseYandProduceF::new)
.to(Topic.A, someProducer)
a Transformer looks like:
class ParseXandProduceE implements Transformer<...> {
@Override
public KeyValue<String, Message> transform (String key, Message message) {
message.x = parse(message.rawX);
context.forward(newKey, message.x, Topic.E);
return KeyValue.pair(key, message);
}
}
However, this is cumbersome, the same messages flow multiple times through these streams.
Additionally, there is a consumer that stores messages of topic.A
in the database. Messages are currently stored multiple times, before each transformation and after each transformation. It is necessary to store each message once.
The following could work, but seems unfavorable since each block of filter+transform could have been put cleanly in its own separate class:
streamsBuilder.stream(Topic.A, someConsumer)
\\ transformer that filters and edits the message and produces new Topic.E + Topic.F messages
.transform(someTransformer)
.to(Topic.B, someProducer)
and make the persistence consumer listen to Topic.B
.
Is the latter proposed solution the way to go, or is there some other way to achieve the same result? Maybe with a complete Topology configuration of Sources and Sinks? If so, what would that look like for this scenario?