0

For a serialization/protocol format I have to encode unsigned numbers up to unsigned 64bit integer in a space-saving way that should still be easy to implement (meaning, I'm not looking for a dedicated compression algorithm). I was thinking about the following:

if n<128  
    take bits 0..6 for representing n, set overflow bit 7 to 0
    store one byte
if n>=128 and n<16384
    take bits 0..6 of byte 1 as bits 0..6 of n, set overflow bit 7 of byte 1 to 1
    take bits 0..6 of byte 2 as bits 7..13 of n, set overflow bit 7 of byte 2 to 0
    store byte 1 followed by byte 2
 if n>=16384 and n<2^21
    ...set overflow bit 7 of byte 2 to 1... (and so on)

I have two questions about this:

  1. How is this format called? Where can I look up implementations?

  2. This is for a binary protocol that will be sent over sockets, where small numbers <128 will be sent very often. Do you think the extra processing is worth it?

user601395
  • 13
  • 4

2 Answers2

0

Not the same as, but similar to UTF-8.

Edit

BTW: try and choose a known protocol. UTF-8, Huffman encoding...

cadrian
  • 7,332
  • 2
  • 33
  • 42
  • Thanks for the prompt reply. Sure, I want to use a known protocol, that's why I'm asking. I'm aware of the similarity to UTF8 but this is for numbers. Using a byte-based format with 7 bits for encoding and 1 overflow bit is rather natural and I find it unlikely that I've just invented it. Has no one seen this before? – user601395 Feb 03 '11 at 15:50
0

Okay, after some more research I've finally found it. It's called 'variable-length quantity' and used in MIDI and ASN.1 (see Wikipedia Entry)

To answer my other question, I'm tending to believe it isn't worth the processing overhead but I'm still pondering about it.

user601395
  • 13
  • 4