Can I use akka persistence when the actor state is only increase in size?

Question

I'm playing with akka persistence trying to implement a service where my state is a potentially very big (well let's say it won't fit in RAM) list of some entities. Lets say user want all history on all entities to be available. Can I do that in akka persistence?

Right now my actor state looks like that.

case class System(var processes: Map[Long, Process] = Map()) {

  def updated(event: Event): System = event match {
    case ProcessDetectedEvent(time, activitySets, id, processType) =>
      val process = Process(activitySets.coordinates, time, activitySets.channels, id, processType, false)
      copy(processes = processes + (id -> process))

    case ProcessMovedEvent(id, activitySets, time) =>
      val process = Process(activitySets.coordinates, time, activitySets.channels, id, processes(id).processType, false)
      copy(processes = processes + (id -> process))

    case ProcessClosedEvent(time, id) =>
      val currentProcess = processes(id)
      val process = Process(currentProcess.coordinates, time, currentProcess.channels, id, currentProcess.processType, true)
      copy(processes = processes + (id -> process))
    case _ => this
  }

}

As you can see the map of Processes is stored in memory, so the application can run out of memory if the number of processes would be large.

I've been studying ES for the past few months and I think you should really try to have a ProcessRootAggregate which spawns other child ProcessAggregates instead of storing all the Processes in one PersistentActor. Your ProcessRootAggregate should send messages to all it's child actors (ProcessAggregates) to perform some computation. Have a look at this template https://github.com/ScalaConsultants/akka-persistence-eventsourcing. — user794783, Oct 26 '15 at 07:43

score 0 · Answer 1 · answered Oct 12 '15 at 13:14

Akka persistence is used by a stateful actor to recover its internal state when the actor is started, restarted after a JVM crash or by a supervisor, or migrated in a cluster. In this case application/JVM may crash with OutOfMemory exception in the long run. And when this actor restarts, the persistence recovery mechanism would recreate all the Process's information in the map. But again the total memory would be high and the application can again crash while running. So persistence in this case would not be helpful to avoid the application crash unless you persist only a partial list of processes to reduce the memory.

So first you need to figure out a way to solve this memory exception. May be you can try the following options.

Try to increase the JVM heap size during JVM restart after the OutOfMemory exception.
While recovering the state, replay only a selected list of messages so that total memory used would be low but the state is incomplete.

If the list of messages to be replayed during recovery is too large, snapshots can be used to reduce the state recovery time.

I think my question was not clear enough. I can't have incomplete state, that's the whole point of question. I've already found one solution - don't have state in memory in actor but instead write it directly to some database which would be used to query the state. That's basically is CQRS. What I don't understand though is why basically all tutorials and examples on akka persistence don't have any information about that. Like they building some customer/order system in tutorials, but they never delete them from memory. That's confusing. — user1685095, Oct 13 '15 at 09:48
The answer, which still applies to CQRS as well, is that these patterns cover scenarios that fit nicely into the ideal world (we all know the real world is slightly different though). — cdmdotnet, Oct 22 '15 at 18:21

score 0 · Answer 2 · answered Oct 31 '15 at 20:45

Perhaps you want to think about whether there are meaningful ways to partition your data set into scopes that do have some reasonable bounds. Then you could represent each scope by a persistent actor, and if you need information that spans your entire store, you would have to have some kind of coordinator for managing the scopes and iterating over them. But depending on how sophisticated this starts getting, at some point, I'd have to wonder if you'd be reinventing map-reduce or Spark.

score -1 · Answer 3 · answered Oct 22 '15 at 18:28

I think what you might be looking for (at least it's another option) is snapshots.

When using event sourcing and event reply the generally advised approach is to use a snapshot every so often.

So when you get back your events you get back a snapshot and then the events that took place since that snapshot. This means you have less objects streamed from your event storage (less memory) and less things to process and apply (faster)... but this does come with it's own trade-offs... which I won't discuss here.

This again only covers the most common scenarios. If your eventing handling changes then you may need to rebuild your events... although this would raise some rather serious and interesting questions about how you are building your system.

I haven't looked too closely, but akka might have this notion of snapshoting built into it. If not, there's a learning curve and lots of trial and error on the road ahead as you start hitting all those roads less travelled in ideal approaches but the real world throws at you.

I'm sorry, but not only this answer doesn't answer my question about what to do if actor state only increasing in size. The snapshot would be loaded in memory. And if it's bigger than memory can hold, then it wouldn't work. — user1685095, Oct 28 '15 at 13:00

Can I use akka persistence when the actor state is only increase in size?

3 Answers3