0

I'm trying some variation of connecting a producer to a consumer with the special case that some times I'd need to produce 1 extra message per message (e.g. 1 to the output topic and 1 message to a different topic) while keeping guarantees on that.

I was thinking of doing mapConcat and outputing multiple ProducerRecord objects, I'm concerned about loose guarantees in the edge case where the first message is enough for the commit to happen on that offset thus causing a potential loss of the second. Also it seems you can't just do .flatmap as you'd be going into the graph API which gets even more muddy as then it becomes harder to make sure once you merge into a commit flow you don't just ignore the duplicated offset.

Consumer.committableSource(consumerSettings, Subscriptions.topics(inputTopic))
  .map(msg => (msg, addLineage(msg.record.value())))
  .mapConcat(input => 
    if (math.random > 0.25) 
      List(ProducerMessage.Message(
        new ProducerRecord[Array[Byte], Array[Byte]](outputTopic, input._1.record.key(), input._2),
        input._1.committableOffset
      ))
    else List(ProducerMessage.Message(
      new ProducerRecord[Array[Byte], Array[Byte]](outputTopic, input._1.record.key(), input._2),
      input._1.committableOffset
    ),ProducerMessage.Message(
      new ProducerRecord[Array[Byte], Array[Byte]](outputTopic2, input._1.record.key(), input._2),
      input._1.committableOffset
    ))
  )
  .via(Producer.flow(producerSettings))
  .map(_.message.passThrough)
  .batch(max = 20, first => CommittableOffsetBatch.empty.updated(first)) {
    (batch, elem) => batch.updated(elem)
  }
  .mapAsync(parallelism = 3)(_.commitScaladsl())
  .runWith(Sink.ignore)

The original 1 to 1 documentation is here: https://doc.akka.io/docs/akka-stream-kafka/current/consumer.html#connecting-producer-and-consumer

Has anyone thought of / solved this problem?

fd8s0
  • 1,897
  • 1
  • 15
  • 29
  • It's not clear why `mapConcat` isn't the solution. What "loose guarantees"? What is that code sample of? – Ramón J Romero y Vigil Oct 01 '18 at 18:20
  • @RamonJRomeroyVigil it's to deal with the scenario where you split into 2 messages, you deliver the first and then your application fails, thus you failed to deliver the second message while potentially committing the first message (so both) – fd8s0 Oct 09 '18 at 09:31

1 Answers1

2

The Alpakka Kafka connector has recently introduced the flexiFlow which supports your use-case: Let one stream element produce multiple messages to Kafka

Enno
  • 283
  • 2
  • 8
  • this is not very clear to me, are you saying you should be able to flexiflow and then produce again and then flexiflow a second time to finally commit? there's no clear examples on this, nor it's very clear at least to me what is exactly the point of the flexiflow – fd8s0 Oct 09 '18 at 13:46
  • 1
    No, the flexiFlow accepts ProducerMessage.MultiMessage which may contain a list of ProducerRecords to be produced (even to multiple topics). – Enno Oct 09 '18 at 17:34