I need to build the following graph:
val graph = getFromTopic1 ~> doSomeWork ~> writeToTopic2 ~> commitOffsetForTopic1
but trying to implement it in Reactive Kafka has me down a rabbit hole. And that seems wrong because this strikes me as a relatively common use case: I want to move data between Kafka topics while guaranteeing At Least Once Delivery Semantics.
Now it's no problem at all to write in parallel
val fanOut = new Broadcast(2)
val graph = getFromTopic1 ~> doSomeWork ~> fanOut ~> writeToTopic2
fanOut ~> commitOffsetForTopic1
This code works because writeToTopic2
can be implemented with ReactiveKafka#publish(..)
, which returns a Sink
. But then I lose ALOS guarantees and thus data when my app crashes.
So what I really need is to write a Flow that writes to a Kafka topic. I have tried using Flow.fromSinkAndSource(..)
with a custom GraphStage
but run up against type issues for the data flowing through; for example, what gets committed in commitOffsetForTopic1
should not be included in writeToTopic2
, meaning that I have to keep a wrapper object containing both pieces of data all the way through. But this conflicts with the requirements that writeToTopic2
accept a ProducerMessage[K,V]
. My latest attempt to resolve this ran up against private and final classes in the reactive kafka library (extending/wrapping/replacing the underlying SubscriptionActor).
I don't really want to maintain a fork to make this happen. What am I missing? Why is this so hard? Am I somehow trying to build a pathological graph node or is this use case an oversight ... or is there something completely obvious I have somehow missed in the docs and source code I've been digging through?
Current version is 0.10.1. I can add more detailed information about any of my many attempts upon request.