0

I am currently building a solution to stream data from mongoDb to elasticsearch. My goal is to keep track of all successful transmitted items to elasticsearch. I am using akka-streams and elastic4s. Currently the streaming into es looks like this

val esSubscriber: BulkIndexingSubscriber[CustomT] = esClient.subscriber[CustomT](
    batchSize = batchSize,
    completionFn = { () => elasticFinishPromise.success(()); ()},
    errorFn = { (t: Throwable) => elasticFinishPromise.failure(t); ()},
    concurrentRequests = concurrentRequests
    )
val esSink: Sink[CustomT, NotUsed] = Sink.fromSubscriber(esSubscriber)

And from my source something like this:

val a: [NotUsed] = mongoSrc
  .via(some operations..)
  .to(esSink)
  .run()

Now everything works fine and right now I am logging for example item count with a second sink. But I would rather log the items really transmitted to elasticsearch. The elastic4s subscriber offers a listener: ResponseListener with onAck(): Unit and onFailure(): Unit and I would love to get this information back into the stream like this

val mongoSrc: [Source..]
val doStuff: [Flow..]
val esSink: [Flow..] //now as flow instead of sink
val logSink: [Sink[Int...]] //now gets for example a 1 for each successful transported item

mongoSrc ~> doStuff ~> esSink ~> logSink

How would I implement that? Do I need a custom stage which buffers the elements of the onAck and the onFailure? Or is there an easier way?

Thanks for any help.

rincewind
  • 623
  • 6
  • 13
  • The Akka Streams reactive-kafka driver does something like this, maybe it could be inspirational to look at those sources: https://github.com/akka/reactive-kafka (the ProducerStage especially) – johanandren Jul 28 '16 at 05:46
  • thank you that looks pretty helpful! Trying this tomorrow – rincewind Jul 28 '16 at 20:03
  • Could you create another stream which is populated via the onAck method? – sksamuel Jul 29 '16 at 10:28
  • @monkjack yes that is exactly what I am doing now, I created another Stream with a source.queue and push to it on onAck. – rincewind Jul 31 '16 at 13:26

1 Answers1

1

You could 'flowify' your Subscriber[T] sink by leveraging Flow.fromSinkAndSource. Check out the 'Composite Flow (from Sink and Source)' illustration from the docs.

In this case, you would be attaching your custom actorPublisher as a source and sending it messages from onAck().

Since you asked for an easier way:

val doStuff = Flow[DocToIndex]
                .grouped(batchSize)
                .mapAsync(concurrentRequests)(bulkopFuture)

In a nutshell and all useful abstractions aside, elastic4s subscriber is just a bulk update request.

Rauan Mayemir
  • 11
  • 1
  • 1
  • Note that now you don't need `esSink` juncture. Your graph will look like `mongoSrc ~> doStuff ~> logSink` – Rauan Mayemir Jul 30 '16 at 17:45
  • Thanks for your answer. I think your approach with the actorPublisher would work fine. I guess it would have to be an pre materialization actorPublisher, because with the source.queue and the composite flow I run into the problem, that I first get the source.queue when the graph is already materialized and thats to late to inject it into the elastic4s subscriber. Now I just create another Stream and let the onAck method push to the source.queue and everything works fine, also I am still not 100% happy with the solution and would prefer a composite flow. – rincewind Jul 31 '16 at 13:25