Akka: persist to Cassandra and publish to Kafka multiple events

Question

I need to store to Cassandra and publish to Kafka multiple events, and call some final handler() only after all events are stored and published.

I came across Update actor state only after all events are persisted approach, but it doesn't cover a case when events should be published to Kafka as well.

There is kafka publisher and base aggregate root actor that processes multiple events and then calls a handler() (it is typically used to return response from the actor):

abstract class AggregateRootActor () extends ActorPersistence {


    def processEvents(events: Seq[Event])(handler: Event => Unit): Unit = {
     persistAll(events) { persistedEvent =>
       state = //updateActorState

       //publish messages to kafka
       val futureResult = publisher.publishToKafka(event)

       // where to switch context to handle `EventProcessingCompleted` after all events are 
       // published?
       context.become {
          case EventProcessingCompleted => handler(persistedEvent)
          case ... // 
       }
      }
     self ! EventProcessingCompleted
    }

 }

Any suggested solutions are welcome!

score 3 · Answer 1 · answered Jun 24 '20 at 15:48

I would structure it like this, assuming that you don't want the actor to reply until the event has been persisted to Cassandra (for future rehydration) and to Kafka (presumably for broadcast to other systems)

// includes the event and anything else you'd want the handler to have,
//  e.g. where to send replies
case class EventProcessingCompleted(...)

persistAll(events) { persistedEvent =>
  state = ???

  // Other state changes (e.g. becomes) here

  publisher.publishToKafka(event).map(_ => EventProcessingCompleted(event)).pipeTo(self)
}

An alternative, which is perhaps more honest about the consistency tradeoffs would be to do the Kafka production by having the actor set up a stream from Akka Persistence Query to the Kafka producer along these lines:

val readJournal = PersistenceQuery(actorSystem).readJournalFor[CassandraReadJournal](CassandraReadJournal.Identifier)

// Spin this up after recovery has completed
val kafkaProductionStream =
  readJournal.eventsByPersistenceId(actorId, state.lastIdToKafka, Long.MaxValue)
    .mapAsync(1) { eventEnvelope =>
      publisher.publishToKafka(eventEnvelope._4.asInstanceOf[???]).map(_ => eventEnvelope._3)
    }
    .mapAsync(1) { sequenceNr => 
      self ? RecordKafkaProductionFor(sequenceNr)
    }

// run the stream etc.

// persist the highwater mark for sequence numbers produced to Kafka and update state

// can now consider persistence to Cassandra to imply production to Kafka, so 
//  can reply after persist to Cassandra

To tighten up the guarantees around production to Kafka, it might be useful to have a component (could be a cluster singleton or sharded) of the application which tracks when persistence IDs have been loaded and loads the least recently used persistence IDs to ensure that the query stream runs.

I like the first approach because of its simplicity. I believe the second is good enough too, but it is too complex for now. But I have concern regarding order of `EventProcessingCompleted` messages that are sent back to actor. Let's say there is behaviour that calls `handler(event)` upon every `EventProcessingCompleted(event)` message. Does it mean that this handler will be triggered after the first received message, instead of being called after the last one? — GoodPerson, Jun 24 '20 at 16:41
I did miss the usage of `persistAll` in the question, but that also calls the event handler multiple times. The simplest way I can think of is to include a list of event IDs for that batch in the `EventProcessingCompleted` message and have the state track which ones we're waiting for (probably clearing that part of the state on recovery and scheduling timeouts if all the triggered handler is doing is sending a reply). When implementing a saga (which is basically what this is, on a small scale), there's a lot of complexity (and this only scratches the surface...) — Levi Ramsey, Jun 24 '20 at 17:14
Yeah, probably will track some ids to make sure that handler is called only after last event has been processed. Thanks for a tip! — GoodPerson, Jun 26 '20 at 09:55
On that theme, there's a complexity point where you'd want to move the handler into its own persistent but temporary actor and have the main persistent actor only really care about its state. — Levi Ramsey, Jun 26 '20 at 16:04
A rough outline in the main actor could be: on receipt of a valid command, spawn a transaction actor which knows where to reply etc., generate events, tell the transaction actor how many events are being persisted, `persistAll`, for each persisted event tell the transaction actor; stop the transaction actor if there's a persistence failure. Transaction actor then performs the relevant post-transaction actions after all the events have been persisted and probably stops itself (and maybe clears state) after some timeout. — Levi Ramsey, Jun 26 '20 at 16:04

score 0 · Answer 2 · answered May 03 '22 at 06:38

Actually at the moment there is component from Akka to realise this

Akka Projections

I think that is what you want, after successfully persisting Events to Cassandra then publish to Kafka.

If you want to see how the Akka Projections functions and how to implement, I wrote a blog about it, you can find the implementation details there.

Akka: persist to Cassandra and publish to Kafka multiple events

2 Answers2