Losing event publishing in Persistent Actor on crash

Question

In this example from Akka persistance documentation

    val receiveRecover: Receive = {
    case evt: Evt                                 => updateState(evt)
    case SnapshotOffer(_, snapshot: ExampleState) => state = snapshot
    }

    val snapShotInterval = 1000
    val receiveCommand: Receive = {
    case Cmd(data) =>
      persist(Evt(s"${data}-${numEvents}")) { event =>
        updateState(event)
        context.system.eventStream.publish(event)
        if (lastSequenceNr % snapShotInterval == 0 && lastSequenceNr != 0)
          saveSnapshot(state)
      }
    case "print" => println(state)
    }

I understand that this lambda:

    event =>
    updateState(event)
    context.system.eventStream.publish(event)
    if (lastSequenceNr % snapShotInterval == 0 && lastSequenceNr != 0)
      saveSnapshot(state)

Is executed when the event has been successfully persisted. What if the actor crashes while this lambda is being executed BEFORE successful publishing of the event, ie before context.system.eventStream.publish(event)?

Do I understand correctly that in such case the event is never published which may lead to an inconsistent state of the system? If so, is there any way to detect that such thing happened?

[EDIT]

Also, if you use the event publishing in your system, then correct me if I am wrong:

If your application is deployed in one JVM and you use the default Akka's event publishing facilities, then JVM crash will mean that all events published but not yet processed will be lost since that facility does not have any recovery mechanisms.
If your application is deployed in a cluster then you'll run in the same situation as above only if the whole cluster goes down.
For any production setup you should configure something like Kafka for event publishing/consuming.

score 0 · Answer 1 · answered May 27 '19 at 23:45

0

I understand that this lambda:

...

Is executed when the event has been successfully persisted. What if the actor crashes while this lambda is being executed BEFORE successful publishing of the event, ie before context.system.eventStream.publish(event)?

The lambda is run after the state is persisted. And the actor essentially suspends itself (putting all pending work in the stash) until that persistence is complete so that it remains consistent.

Do I understand correctly that in such case the event is never published which may lead to an inconsistent state of the system?

No, it will remain consistent for the above reason.

If your application is deployed in one JVM and you use the default Akka's event publishing facilities, then JVM crash will mean that all events published but not yet processed will be lost since that facility does not have any recovery mechanisms.

I guess it depends on what you mean by default event publishing. Regular actors, yes. If you lose the JVM you lose "regular" actors. Regular actors are in memory, essentially like normal Java/Scala objects. Persistent Actors, are, of course a different story.

You also say "published but not yet processed". Those, of course, are lost as well. Anything that isn't "processed" is essentially like a JDBC statement that hasn't been received by the database yet, or a message not transmitted to Kafka, etc. The design is essentially to save the event to the database immediately (almost like a transaction log) and then do the work after it is known to be safely persisted.

If your application is deployed in a cluster then you'll run in the same situation as above only if the whole cluster goes down.

A cluster essentially just gives a place for the persistent actor to recover. The cluster still relies on the persistent store for recovery.

(I'm keeping this answer focused on Akka Persistent Actors, the answers get more varied with things like Distributed Data.)

For any production setup you should configure something like Kafka for event publishing/consuming.

Not necessarily. The persistent module is definitely a consistent option. Kafka and Akka are really just different animals with different goals. Kafka is effectively pub/sub, Akka essentially takes a much more event sourced approach. I've worked systems that use both, but they use them for very different purposes.

answered May 27 '19 at 23:45

David Ogren

4,396
1
19
29

I understand that the system will be consistent as far as persisting of the event goes. But if persisting goes fine, but the actor crashes BEFORE publishing the event, then the system is not consistent because the event couldn't be observed and reacted on anywhere else. When the actor recovers its state from its event store it will not (re)publish the events, so the system is left inconsistent and there is no way of detecting that it happened. It's indistinguishable from scenario when the event was published and handled. Eventual consistency design can't work like this.... – artur May 29 '19 at 15:21
By "default event publishing" I mean the default event bus that comes with akka, which AFAIU has no durability guarantees, if JVM goes down, the events are gone. That's why I wonder if for events that CAN'T be lost, to make that guarnatee you need a "persistent" stream, like for example Kafka and plug it in your actor system to use it for event delivery, so that after crashes your events are sitting safely in Kafka to be consumed. Or do I just not get it :) – artur May 29 '19 at 15:28
So perhaps I wasn't very clear on my question. I understand that if you consider a single actor that presists its state, then all is consistent, because the persisting is atomic. However if you design around eventual consistency with event publishing, that's where I am fuzzy about what happens in various sad panda scenarios. Because AFAICS the examples given in documentation can lead to systems that are not consistent. – artur May 29 '19 at 15:34
1

I don't think I'm going to be able to clear it up from the existing question. You don't lose published events in the result of JVM failure (or even complete cluster failure). But Akka isn't an event bus or durable log. You might want to look at Lagom framework as it shows how you can combine Akka for strongly consistent entities and Kafka for eventually consistent read side projections. It's an example of using each at what they are good at. – David Ogren Jun 03 '19 at 16:24
can you just elaborate on what mechanism is there in the default publishing infrastructure built in Akka for not losing published events, when JVM running the actor system crashes before publishing the event or before it is handled?. AFAIU the delivery guarantees here are "at most once", but even if you plug in sth that gives you "at least once" guarantee, I still see the lambda failing before or on publishing call. – artur Jun 04 '19 at 22:25
I read [Lagom relevant documentation](https://www.lagomframework.com/documentation/1.5.x/java/ReadSideCassandra.html) and AFAIU they use Persistent Query + event tagging + event offsets to guarantee "at least once" delivery of events (or "exactly once" if you persist the offset atomically with your updates on read model). At least for Cassandra implementation. I guess there's some notification mechanism in Cassandra on DB writes, or else they do polling. Not that it proves anything, but I guess they wouldn't do all that stuff if vanilla Akka could make the same guarantees on event publishing. – artur Jun 04 '19 at 22:36
This seems relevant: https://stackoverflow.com/questions/42261834/lagom-message-durability-between-persistent-actor-and-read-processor – artur Jun 20 '19 at 08:32

Losing event publishing in Persistent Actor on crash

1 Answers1