16

I'm writing a simple streaming JSON service. It consists of JSON messages, sent intermittently, for a long period of time (weeks or months).

What is the best practise with regard to sending multiple JSON messages over a plain TCP socket?

Some alternatives I have looked at (and their downsides) are:

  1. newline separated JSON - downside: newlines within JSON require escaping, or prohibition
  2. websocket inspired 0x00 0xff framing - downside: it's now binary, not utf-8 anymore
  3. real websockets - downside: lack of (opensource) websocket client libraries
  4. http multipart http://www.w3.org/Protocols/rfc1341/7_2_Multipart.html - downside: incomplete client support?
  5. no delimiters - downside: chunking requires JSON parsing (can't just count curlies because of curlies in strings)

Is there a good, or at least well-established way of doing this?

Sheena
  • 15,590
  • 14
  • 75
  • 113
fadedbee
  • 42,671
  • 44
  • 178
  • 308
  • what about opening/closing the socket between each message? – fvu Jul 04 '11 at 16:28
  • @fvu at peak times we may have ten or more messages per second, so this is not efficient. It could also cause NAT exhaustion on weak routers. – fadedbee Jul 04 '11 at 16:31
  • Why can't one count curlies? one could detect and avoid counting curlies in strings, couldn't one? – moala Aug 06 '14 at 11:00

5 Answers5

18

my first two options would be:

  1. Do what early TCP protocols do: send one message (a JSON object in your case) and close the connection. The client detects it and reopens to get the next object.

    • pros: very easy to parse, no extra (content) bytes sent. any loss of data means losing just a single object. if you can stand that, there's no need to add retransmision to your app.
    • cons: if you send a (huge) lot of (very) small objects, the three-packet TCP handshake adds to latency.
  2. Do what chunked-mode HTTP does: first send the number of bytes in the JSON object, a newline (CRLF in HTTP), and your JSON object. The client just have to count bytes to know when the next byte would be the next objectsize.

    • pros: you keep one long-lived stream.
    • cons: a few extra bytes, you have to keep a long-lived stream, so accidental break and reconnection has to be handled as exceptional events, need to establish some handshaking to continue where it failed.
Javier
  • 60,510
  • 8
  • 78
  • 126
  • You might consider what oberstet suggested. For my projects I normally use Netstrings or Bencoding-like framing. Easy to implement in most cases and only adds minimal overhead. – BastiBen Apr 06 '13 at 09:37
5

When you want to serve browser clients, the closest you get to raw TCP is WebSockets.

WebSockets has sufficient momentum that browser vendors will improve support (Chrome 14 and Firefox 7/8 support the latest protocol draft) and that a broad range of client and server frameworks will support it.

There are already a couple of open-source client libraries, including Autobahn WebSocket.

When you want to bake something for your own (on top of raw TCP), I would recommend a length-prefixed format for your JSON messages, i.e. Netstrings

Disclaimer: I am author of Autobahn and work for Tavendo.

oberstet
  • 21,353
  • 10
  • 64
  • 97
2

I've codified what I and some other developers are doing:

http://en.wikipedia.org/wiki/Line_Delimited_JSON

It has the advantage of being netcat/telnet compatible.

See also: http://ndjson.org/

fadedbee
  • 42,671
  • 44
  • 178
  • 308
  • 3
    as I said in wikipedia it's not the best place to create original work you should put elsewhere – Xavier Combelle Jul 12 '13 at 19:53
  • 1
    this certainly doesn't look like any "standard", nor something i would advice to anybody. the proposed implementations are either grossly inefficient (parsing everything so far on every linebreak: O(n^2)) or depends on "custom parsers". Just noticed, that wikipedia page is written by chrisdew, and the "reference implementation" is his own. That violates 4 of the 7 content criteria – Javier Jul 12 '13 at 20:16
  • @Javier I don't understand how the parsing is O(n^2) can you develop ? As far as I understand it enought to line = readline() then parse(line) which are O(n) operations – Xavier Combelle Sep 02 '13 at 07:09
  • @XavierCombelle then you repeat that once for each line. – Javier Sep 02 '13 at 13:55
  • 1
    @Javier if I have n ligne of m characters there is n * O(m) tratement which is O(n*m) which is O(N) the number total of character N = n*m – Xavier Combelle Sep 02 '13 at 15:44
  • @XavierCombelle the first time it tries a single line (n chars), if it fails, then it tries the first two lines (2n), then three (3n), total time is O(m*n^2/2) – Javier Sep 02 '13 at 17:14
  • IOW: it's a classical example of the [Shlemiel the painter's algorithm.](http://www.joelonsoftware.com/articles/fog0000000319.html) – Javier Sep 02 '13 at 17:21
  • @Javier there is no need to read more than one line in the same time – Xavier Combelle Sep 02 '13 at 19:05
  • from the "definition" in wikipedia: "A simple implementation is to accumulate received lines. Every time a line ending is encountered, an attempt must be made to parse the accumulated lines into a JSON object." – Javier Sep 02 '13 at 19:07
  • Why not simply use `\30` (ASCII control character for record separator) as a delimiter between subsequent JSON serialized objects? This is still perfectly valid UTF8. – oberstet Jan 15 '14 at 18:38
  • 1
    I'm using this format as a substitution for csv files (specially useful when dealing with multidimensional data). It's way easier to work with this format from unix command line when you have huge data sets. Json escapes all newlines so you can safely grep/tail/split data to quickly get the parts that you need without decoding each line. It saves me days when sampling big data sets. – ivanhoe Mar 23 '15 at 16:36
  • 1
    There's already a standard for this. RFC 7464 (https://tools.ietf.org/html/rfc7464) – Joe Hildebrand Jun 13 '15 at 06:20
2

The first of four bytes of the message can be an 32-bit integer indicating size (in bytes) of the message. Then the receiver should follow these steps:

  1. Read the first four bytes of data and figure out the exact amount of bytes you need to read the whole message.
  2. Read the rest of the message and deserialize it as a JSON

Sender code in C#:

        public void WriteMessage(Packet packet) {
        // Convert the object to JSON
        byte[] message = Encoding.UTF8.GetBytes(packet.Serialize());

        // Serialize the number of characters
        byte[] messageLength = BitConverter.GetBytes(message.Length);

        // Build the full message that will hold both the size of the message and the message itself
        byte[] buffer = new byte[sizeof(int) + message.Length];

        Array.Clear(message, 0, message.Length);

        // Print the size into the buffer
        for (int i = 0; i < sizeof(int); i++)
        {
            buffer[i] = messageLength[i];
        }

        // Print the message into the buffer
        for (int i = 0; i < message.Length; i++)
        {
            buffer[i + sizeof(int)] = message[i];
        }

        // Send it
        stream.Write(buffer, 0, buffer.Length);
    }
jvitoroc
  • 380
  • 1
  • 4
  • 16
1

You can use Server-Sent Events.

var source = new EventSource('/EventSource');

source.onmessage = function(e) {
  var data = JSON.parse(e.data);
  console.log(e.data);
};

source.onopen = function(e) {
  console.log('EventSource opened');
};

source.onerror = function(e) {
  console.log('EventSource error');
};
Fred
  • 12,086
  • 7
  • 60
  • 83