Process messages at specific time reliably

Question

Let's assume I have a chat application.

Client send message to chat which resulted in some command to some Actor. Now, I want to process what he wrote immediately and make it available to other users in this chat, so I process this command. At the same time I want to tell myself (an actor) that I need to store this message in chat history database BUT not right now. Save to the database should happen every 2 minutes. And if there was a crash I should be able to persist to the database anyway.

I assume workflow be like this:

User send a message
Chat room actor received a command with this message
We broadcast this message to everyone and add this message to some kind of queue to persist it to chat history database
Some persist command runs when 2 minute timeout has passed. It collects all incoming chat messages which hasn't been persisted yet in the order of their arrival
Run transaction with all messages and then remove them from the queue.
If there was a crash somewhere after 3 and messages wasn't persisted, then I should try to persist them again. If they was persisted I should never try to persist them again.

How to construct something like this in Akka? Which features should I use/which patterns?

Take a look at http://letitcrash.com/post/28901663062/throttling-messages-in-akka-2 - this is actually #4-5 from your use case — abatyuk, Dec 20 '14 at 11:57

dk14 · Accepted Answer · 2015-01-11T00:22:11.380

You may need two actors: one (coordinator) will send notifications about chat commands to clients. Another (throttler) - will push data to the database every 2 minutes. Your queue will be just an internal state of throttler:

class Coordinator extends Actor {
   def receive = {
     case command: Record => 
          broadcast(command)
          throttler ! command
   }
}


class Throttler extends Actor {

  import system.dispatcher

  val queue = mutable.List[Record] //it may be a cache instead

  def schedule = system.scheduler.scheduleOnce(2 minutes, self, Tick) // http://doc.akka.io/docs/akka/snapshot/scala/scheduler.html


  def receive = {
       case Start => schedule
       case command: Record =>
           queue ++= command
       case Tick => 
          schedule
          try {
            //---open transaction here---
            for (r <- queue) push(r)
            //---close transaction here---
            queue.clear //will not be cleared in case of exception
          } catch {...}
  }
}

You may also use FSM-based implementation as @abatyuk said.

If you need to decrease load to the mailboxes - you could try some backpressure/loadbalancing patterns like Akka Work Pulling.

If you want to protect node itself (to recover queue state if some of your server nodes fails) - you can use Akka Cluster to replicate (manually) queue's state. In that case coordinator should be Cluster Singleton and should send ticks by itself to random actor (you may use bus for this) and maintain their successes and failures as a supervisor. Note that supervisor state may be lost, so you should also replicate it through the nodes (and merge state between them every 2 minutes, so it's better to use SortedSet for queues, because merging will be something like sets.reduce(_ ++ _)).

Storages like Riak are already providing a simple way to solve clusterization problem, so you can use them as queues (and both coordinator and throttler will be "stateless" singletons). In case of Riak you may configure it as Available+Partitioning (see CAP-theorem) because merging data is not a problem here - your chat history is CRDT(conflict-free) data type.

Another solution is a cache with WriteBehind mode (configured to launch every 2 minutes) as the throttler.

Event sourcing could also protect your actor's state, but it's more useful when you need to redo all actions after restore (you don't need this - it will resubmit everything to the database). You can use snapshots (it's pretty much same as using cache directly), but it's better to save them to the cache (by implementing SnapshotStore) instead of local FS if you care about availability. Note, that you also may have to delete previously saved snapshots to reduce storage size. And you should synchronously save every record to avoid loosing state.

P.S. Don't forget to acknowledge a message to sender (to your javascript) or you could loose last messages (in mailboxes) even with cache as a queue.

P.S/2 Database is almost always a bad solution for actor's state persistence, because it's slow and may become unavailable. I wouldn't also recommend strong-consistent NoSQL solutions like MongoDB - eventual consistency is the best choice in your case.

Is there any way to leverage existing Akka Persistence (Events, etc) to protect queue state? — bobby, Dec 20 '14 at 14:42
you can use Akka Persistence (saveSnapshot), but the problems is same as with DB - it may be pretty much slow. And also you will have to synchronously save every record to avoid loosing state, and probably you will have to send it to DB or cache (local files may be not reliable for you), so actually there is no big difference. Note, that it will not solve potential availability problem (you still need cluster for this). — dk14, Dec 20 '14 at 16:06
Can't I use deleteMessages so it won't redo all actions? Then I won't need saveSnapshot. — bobby, Dec 20 '14 at 19:23
Or use event sourcing as is and implement database saving through custom SnapshotStore and save snapshot at 2 minutes interval. — bobby, Dec 20 '14 at 19:51
the problem is still same - you have to save every message exactly at receiving moment (not bulk), so you can't use DB for that: in 1st case every message will be saved by `persist` (and you should notify sender only when it (asynchronously) commits the data), so there is no sense to use bulk-transaction after that. in 2nd case - you can't save snapshots every 2 minutes because you will loose your unsaved data in case of failure (your client can't wait 2 mins for acknowledge, otherwise you don't need queue persistence at all). — dk14, Dec 20 '14 at 20:05
in general, database is almost always a bad solution for state persistence, because it's slow and may become unavailable. I wouldn't also recommend strong-consistent NoSQL solutions, like MongoDB. — dk14, Dec 20 '14 at 20:13

Process messages at specific time reliably

1 Answers1