Serializing mutable state and sending it asynchronously over the network with nearly-zero-copy (Cap'n Proto + ZeroMQ)

Question

I have an application in which I'd like to send part of its mutable state over the network to another machine (there will be a cluster of those machines) to do some CPU-intensive computations on it and get back the results. Like asynchronous RPC. Such calls will happen many times during the execution of the program, so I'd like to make the overhead as small as possible, e.g. minimize the number of redundant copies of the data. The size of the data varies from tens of bytes to hundreds of KBs, maybe even few MBs. Its structure is relatively complex, it consists of a set of object trees, but the leaves contain only primitive types and the internal nodes contain minimal metadata.

I'm considering Cap'n Proto for serialization (though, in this case I'd have to create a redundant model for my data), and ZeroMQ for transport. On the client/main application side I'd like to use azmq, because I need Boost:Asio's features (namely coroutine/fiber support). The language is C++.

Summarizing with a very rough sketch:

RelativelyComplexState data;
CapnProtoRequest cp_req = buildRequest(data); // traverses my data, creates C'n P object
azmq_socket.async_send(boost::asio::buffer(cp_req, cp_req.size)); //azmq always copies the buffer? Not good.
// do other stuff while request is being processed remotely
// get notification from azmq/Boost:Asio when reply has arrived
azmq::message msg();
azmq_socket.async_receive(some_message_handler?); // get all the data into msg
CapnProtoResponse cp_resp = parseResponse(msg.cbuffer()); // interpret bytes as C'n P object, hopefully no copy
RelativelySimpleResult result = deserialize(cp_resp);

Is this feasible, or is there a better way? Would a schemaless serialization method (i.e. Boost::Serialization) make my life easier and/or the application more efficient in this case?

Also, what is the best way to send and receive a Cap'n Proto object with ZeroMQ/azmq, avoiding unnecessary copies? By looking at the source code of azmq, it seems that for sending, azmq always copies the buffer contents. What are the more subtle issues (segmenting/framing, etc.)? I'm not familiar with the libraries and haven't found any explanation or good examples.

Thank you!

I wouldn't consider Boost Serializer _schema-less_. It's just that the schema is defined _in-code_. But you cannot expect to deserialize anything that wasn't serialized to the exact expected layout (not even with XML archives, e.g.) — sehe, Jan 26 '15 at 12:13
True, but I already have the class hierarchy (i.e. the in-code schema), thus I wouldn't have to define an additional schema, just extend the classes with the serialization methods. What do you mean by exact expected layout? AFAIK the tree structure can be arbitrary, and thanks to Boost::Serialization's pointer handling, it can be deserialized without problems. — remv, Jan 26 '15 at 15:23
"it can be deserialized witohut problems" doesn't come close to "it's schemaless". But as long as you know what you meant, I just wanted to avoid confusion over the wording in the question — sehe, Jan 26 '15 at 15:25
Thanks for the clarification! I will have the same class hierarchy on both the sender and receiver side, and this will be the schema, as you said. — remv, Jan 26 '15 at 16:15
I had to do something extremely similar but I used google flatbuffers, so you may find [AzmqFlatbuffer](https://github.com/ahundt/grl/blob/master/include/grl/AzmqFlatbuffer.hpp) of interest as a reference. — Andrew Hundt, Jul 29 '15 at 17:21
Thank you for the note, Andrew! In the end, I found a way to completely bypass Asio (I used only Boost Fiber+czmq). I was regularly sending out lots of requests, so after every send I just checked if there are any outstanding replies and received all of them in one batch. In the rare case when I had to wait for a reply, I did simple polling. Combined with a few other related changes, removing Asio made my code much more efficient. — remv, Aug 06 '15 at 13:33

score 11 · Accepted Answer · answered Jan 26 '15 at 18:25

I do not know much about ZeroMQ's interface but I can give advice on how to minimize copies from Cap'n Proto.

On the sending side, use capnp::MessageBuilder::getSegmentsForOutput() (capnp/message.h) to get direct pointers to the message's content without copying. This gives you an array of arrays of bytes (actually, words, but you can cast them to bytes). You need to somehow feed these to ZeroMQ without copying them. You'll need to make sure that the boundaries between segments are preserved -- the goal is to come up with exactly the same array of arrays on the receiving end. Maybe ZeroMQ has explicit support for multi-segment messages and can remember the segment boundaries for you; if not, you'll need to prefix your message with a table of segment sizes.

On the receiving side, once you have rebuilt your array of segments, construct a capnp::SegmentArrayMessageReader (capnp/message.h) and pass the array to the constructor. This will use the underlying data without copying. (Note that you will need to make sure that the data is aligned on a 64-bit boundary. I'm not sure if ZeroMQ guarantees this.)

Note that if both your client and server are C++, you may want to consider using Cap'n Proto's own RPC protocol, which is easier to set up and already avoids all unnecessary copies. However, integrating Cap'n Proto's event loop with boost::asio is currently non-trivial. It's possible -- for example you can look at node-capnp which integrates Cap'n Proto with libuv's event loop -- but may be more work than you want to do.

(Disclosure: I'm the author of Cap'n Proto.)

Thank you for the clear explanation, Kenton! This information could be a useful addition to the Cap'n Proto site, too. Segments can be sent as ZMQ frames this way the separation would be preserved, and zmq_msg_init_data() can be used for zero-copy. I've looked at Cap'n Proto RPC and while it would make communication simpler, it isn't suitable for me, because I need server clustering and coroutines/fibers in the clients (latter is on the road map, I saw). — remv, Jan 27 '15 at 16:35
Today, I found one more issue with representing my data structure using a Cap'n Proto schema. The nodes in the tree have a common base class, but are of various subtypes, most of which have additional data members. In Cap'n Proto a way to represent this would be a single struct containing the common data members and a union with the optional ones incl. a void dummy element (and of course pointers to the child nodes). Is this the best way to do it? Thanks again (and also for the awesome software!) — remv, Jan 27 '15 at 16:36
Yes, a union is best if you have a fixed set of subclasses. On the other hand, if you want people to be able to extend your protocol without editing your code, then you'll need a field of type `AnyPointer`, and probably another field identifying the type in use (I suggest `typeId :UInt64`; use capnp::typeId() to get its 64-bit ID). — Kenton Varda, Jan 28 '15 at 04:43
I have accepted your answer Kenton, because you gave me lots of useful information regarding the Cap'n Proto parts. Thanks again! Maybe I'll have to open another question focusing on the Boost Asio aspects. PS: I can't upvote you, because I don't have enough rep. I'll have to work on that :) — remv, Jan 31 '15 at 13:29

Serializing mutable state and sending it asynchronously over the network with nearly-zero-copy (Cap'n Proto + ZeroMQ)

1 Answers1