Are serializers the right spot to remove shared state from Akka messages?

Question

I am working on a distributed algorithm and decided to use a Akka to scale it across machines. The machines need to exchange messages very frequently and these messages reference some immutable objects that exist on every machine. Hence, it seems sensible to "compress" the messages in the sense that the shared, replicated objects should not be serialized in the messages. Not only would this save on network bandwidth but it also would avoid creating duplicate objects in the receiver side whenever a message is deserialized.

Now, my question is how to do this properly. So far, I could think of two options:

Handle this on the "business layer", i.e., converting my original message objects to some reference objects that replace references to the shared, replicated objects by some symbolic references. Then, I would send those reference objects rather than the original messages. Think of it as replacing some actual web resource with a URL. Doing this seems rather straight-forward in terms of coding but it also drags serialization concerns into the actual business logic.
Write custom serializers that are aware of the shared, replicated objects. In my case, it would be okay that this solution would introduce the replicated, shared objects as global state to the actor systems via the serializers. However, the Akka documentation does not describe how to programmatically add custom serializers, which would be necessary to weave in the shared objects with the serializer. Also, I could imagine that there are a couple of reasons, why such a solution would be discouraged. So, I am asking here.

Thanks a lot!

The documentation says how to write custom serializer/deserializers ( http://doc.akka.io/docs/akka/current/java/serialization.html#customization ). Go with approach 2 imho. You can do it "programmatically" by using static stuff — Diego Martinoia, Jun 21 '17 at 08:56
Thanks! I came across the [SerializationSetup](http://doc.akka.io/api/akka/current/akka/serialization/SerializationSetup$.html) object, which I deemed to be the proper way of "programmatically" adding a serializer at runtime. That one is not found in the documentation, though. — Sebastian Kruse, Jun 21 '17 at 09:30
Should I convert the comment in an answer and we close the question or do you need more info? — Diego Martinoia, Jun 21 '17 at 10:29
I need some more thinking whether this answers my questions. While I do see the technical feasibility of this approach, I am not yet convinced that option 2 is the definitive way to go. But please go ahead and put it as an answer. — Sebastian Kruse, Jun 21 '17 at 10:52

score 1 · Accepted Answer · answered Jun 21 '17 at 11:39

It's possible to write your own, custom serializers and let them do all sorts of weird things, then you can bind them at the config level as usual:

class MyOwnSerializer extends Serializer {

  // If you need logging here, introduce a constructor that takes an ExtendedActorSystem.
  // class MyOwnSerializer(actorSystem: ExtendedActorSystem) extends Serializer
  // Get a logger using:
  // private val logger = Logging(actorSystem, this)

  // This is whether "fromBinary" requires a "clazz" or not
  def includeManifest: Boolean = true

  // Pick a unique identifier for your Serializer,
  // you've got a couple of billions to choose from,
  // 0 - 40 is reserved by Akka itself
  def identifier = 1234567

  // "toBinary" serializes the given object to an Array of Bytes
  def toBinary(obj: AnyRef): Array[Byte] = {
    // Put the code that serializes the object here
    //#...
    Array[Byte]()
    //#...
  }

  // "fromBinary" deserializes the given array,
  // using the type hint (if any, see "includeManifest" above)
  def fromBinary(
    bytes: Array[Byte],
    clazz: Option[Class[_]]): AnyRef = {
    // Put your code that deserializes here
    //#...
    null
    //#...
  }
}

But this raises an important question: if your messages all references data that is shared on the machines already, why would you want to put in the message the pointer to the object (very bad! messages should be immutable, and a pointer isn't!), rather than some sort of immutable, string objectId (kinda your option 1) ? This is a much better option when it comes to preserving the immutability of the messages, and there is little change in your business logic (just put a wrapper over the shared state storage)

for more info, see the documentation

score 0 · Answer 2 · answered Jun 23 '17 at 07:32

I finally went with the solution proposed by Diego and want to share some more details on my reasoning and solution.

First of all, I am also in favor of option 1 (handling the "compaction" of messages in the business layer) for those reasons:

Serializers are global to the actor system. Making them stateful is actually a most severe violation of Akka's very philosophy as it goes against the encapsulation of behavior and state in actors.
Serializers have to be created upfront, anyway (even when adding them "programatically").
Design-wise, one can argue that "message compaction is not a responsibility of the serializer, either. In a strict sense, serialization is merely the transformation of runtime-specific data into a compact, exchangable representation. Changing what to serialize, is not a task of a serializer, though.

Having settled upon this, I still strived for a clear separation of "message compaction" and the actual business logic in the actors. I came up with a neat way to do this in Scala, which I want to share here. The basic idea is to make the message itself look like a normal case class but still allow these messages to "compactify" themselves. Here is an abstract example:

class Sender extends ActorRef {
   def context: SharedContext = ... // This is the shared data present on every node.

   // ...

   def someBusinessLogic(receiver: ActorRef) {
     val someData = computeData
     receiver ! MyMessage(someData)
   }
}

class Receiver extends ActorRef {
   implicit def context: SharedContext = ... // This is the shared data present on every node.

   def receiver = {
     case MyMessage(someData) =>
       // ...
   }
}

object Receiver {
  object MyMessage {
    def apply(someData: SomeData) = MyCompactMessage(someData: SomeData)
    def unapply(myCompactMessage: MyCompactMessage)(implicit context: SharedContext)
    : Option[SomeData] =
      Some(myCompactMessage.someData(context))
  }
}

As you can see, the sender and receiver code feels just like using a case class and in fact, MyMessage could be a case class. However, by implementing apply and unapply manually, one can insert its own "compactification" logic and also implicitly inject the shared data necessary to do the "uncompactification", without touching the sender and receiver. For defining MyCompactMessage, I found Protocol Buffers to be especially suited, as it is already a dependency of Akka and efficient in terms of space and computation, but any other solution would do.

Are serializers the right spot to remove shared state from Akka messages?

2 Answers2